kubostat2017e p.1 I 2017 (e) GLM logistic regression : : :02 1 N y count data or

Similar documents
一般化線形 (混合) モデル (2) - ロジスティック回帰と GLMM

kubostat2017c p (c) Poisson regression, a generalized linear model (GLM) : :

講義のーと : データ解析のための統計モデリング. 第3回

kubostat2017b p.1 agenda I 2017 (b) probability distribution and maximum likelihood estimation :

講義のーと : データ解析のための統計モデリング. 第5回

/ 55 2 : : (GLM) 1. 1/23 ( )? GLM? (GLM ) 2.! 1/25 ( ) ffset (GLM )

講義のーと : データ解析のための統計モデリング. 第2回

kubostat2018d p.2 :? bod size x and fertilization f change seed number? : a statistical model for this example? i response variable seed number : { i

kubo2017sep16a p.1 ( 1 ) : : :55 kubo ( ( 1 ) / 10

k3 ( :07 ) 2 (A) k = 1 (B) k = 7 y x x 1 (k2)?? x y (A) GLM (k

kubostat2017j p.2 CSV CSV (!) d2.csv d2.csv,, 286,0,A 85,0,B 378,1,A 148,1,B ( :27 ) 10/ 51 kubostat2017j (

kubostat2018a p.1 統計モデリング入門 2018 (a) The main language of this class is 生物多様性学特論 Japanese Sorry An overview: Statistical Modeling 観測されたパターンを説明する統計モデル

統計モデリング入門 2018 (a) 生物多様性学特論 An overview: Statistical Modeling 観測されたパターンを説明する統計モデル 久保拓弥 (北海道大 環境科学) 統計モデリング入門 2018a 1

12/1 ( ) GLM, R MCMC, WinBUGS 12/2 ( ) WinBUGS WinBUGS 12/2 ( ) : 12/3 ( ) :? ( :51 ) 2/ 71

1 15 R Part : website:

,, Poisson 3 3. t t y,, y n Nµ, σ 2 y i µ + ɛ i ɛ i N0, σ 2 E[y i ] µ * i y i x i y i α + βx i + ɛ i ɛ i N0, σ 2, α, β *3 y i E[y i ] α + βx i

(2/24) : 1. R R R

/22 R MCMC R R MCMC? 3. Gibbs sampler : kubo/

DAA09

今回 次回の要点 あぶない 時系列データ解析は やめましょう! 統計モデル のあてはめ Danger!! (危 1) 時系列データの GLM あてはめ (危 2) 時系列Yt 時系列 Xt 各時刻の個体数 気温 とか これは次回)

kubostat1g p. MCMC binomial distribution q MCMC : i N i y i p(y i q = ( Ni y i q y i (1 q N i y i, q {y i } q likelihood q L(q {y i } = i=1 p(y i q 1

Use R

tokei01.dvi

こんにちは由美子です

Stata11 whitepapers mwp-037 regress - regress regress. regress mpg weight foreign Source SS df MS Number of obs = 74 F(

1 Stata SEM LightStone 4 SEM 4.. Alan C. Acock, Discovering Structural Equation Modeling Using Stata, Revised Edition, Stata Press 3.

kubo2015ngt6 p.2 ( ( (MLE 8 y i L(q q log L(q q 0 ˆq log L(q / q = 0 q ˆq = = = * ˆq = 0.46 ( 8 y 0.46 y y y i kubo (ht

分布

第13回:交差項を含む回帰・弾力性の推定

Rによる計量分析:データ解析と可視化 - 第3回 Rの基礎とデータ操作・管理

最小2乗法

1 Stata SEM LightStone 3 2 SEM. 2., 2,. Alan C. Acock, Discovering Structural Equation Modeling Using Stata, Revised Edition, Stata Press.


¥¤¥ó¥¿¡¼¥Í¥Ã¥È·×¬¤È¥Ç¡¼¥¿²òÀÏ Âè2²ó

浜松医科大学紀要

2 / 39

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

X X X Y R Y R Y R MCAR MAR MNAR Figure 1: MCAR, MAR, MNAR Y R X 1.2 Missing At Random (MAR) MAR MCAR MCAR Y X X Y MCAR 2 1 R X Y Table 1 3 IQ MCAR Y I

Dirichlet process mixture Dirichlet process mixture 2 /40 MIRU2008 :

第11回:線形回帰モデルのOLS推定

: (GLMM) (pseudo replication) ( ) ( ) & Markov Chain Monte Carlo (MCMC)? /30

PackageSoft/R-033U.tex (2018/March) R:

dvi

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

03.Œk’ì

わが国企業による資金調達方法の選択問題

Presentation Title Goes Here


JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

( 28 ) ( ) ( ) 0 This note is c 2016, 2017 by Setsuo Taniguchi. It may be used for personal or classroom purposes, but not for commercial purp

149 (Newell [5]) Newell [5], [1], [1], [11] Li,Ryu, and Song [2], [11] Li,Ryu, and Song [2], [1] 1) 2) ( ) ( ) 3) T : 2 a : 3 a 1 :

dvi

1 環境統計学ぷらす 第 5 回 一般 ( 化 ) 線形混合モデル 高木俊 2013/11/21

: (EQS) /EQUATIONS V1 = 30*V F1 + E1; V2 = 25*V *F1 + E2; V3 = 16*V *F1 + E3; V4 = 10*V F2 + E4; V5 = 19*V99

山形大学紀要

28 Horizontal angle correction using straight line detection in an equirectangular image

R John Fox R R R Console library(rcmdr) Rcmdr R GUI Windows R R SDI *1 R Console R 1 2 Windows XP Windows * 2 R R Console R ˆ R

2

こんにちは由美子です

駒田朋子.indd

‚åŁÎ“·„´Šš‡ðŠp‡¢‡½‹âfi`fiI…A…‰…S…−…Y…•‡ÌMarkovŸA“½fiI›ð’Í

JMP V4 による生存時間分析

A B C B C ICT ICT ITC ICT

% 10%, 35%( 1029 ) p (a) 1 p 95% (b) 1 Std. Err. (c) p 40% 5% (d) p 1: STATA (1). prtesti One-sample test of pr

,.,.,,.,,.,,,,,.,,,.,.,,,.,,.,,,,,,,.,,.,,.,,,,.,,,,,,.,,.,,.,.,,,,,,.,,,,.


Kyushu Communication Studies 第2号

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

Transcription:

kubostat207e p. I 207 (e) GLM kubo@ees.hokudai.ac.jp https://goo.gl/z9ycjy 207 4 207 6:02 N y 2 binomial distribution logit link function 3 4! offset kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 2 / 47 statistaical models appeared in the class 6 GLM : 202 05 8 http://goo.gl/ufq2 The development of linear models Hierarchical Bayesian Model Be more flexible Generalized Linear Mixed Model (GLMM) Incoporating random effects such as individuality parameter estimation MCMC MLE Generalized Linear Model (GLM) Always normal distribution? That's non-sense! MSE Linear model Kubo Doctrine: Learn the evolution of linear-model family, firstly! kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 3 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 4 / 47? how to specify GLM Generalized Linear Model (GLM) (Poisson regression) (logistic regression) (linear regression) Generalized Linear Model (GLM) probability distribution? linear predictor? link function? kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 5 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 6 / 47

kubostat207e p.2 how to specify Poisson regression model, a GLM GLM how to specify model, a GLM GLM logistic probability distribution Poisson distribution : linear predictor e.g., β + β 2 x i link function log link function -2 0 2 4 6 0.5.0.5 2.0 probability distribution binomial distribution : linear predictor e.g., β + β 2 x i link function logit yi x i kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 7 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 8 / 47 N y N y?. N y seeds alive 8 y! y i {0,, 2,, 8} f i C: T: i N i = 8 y i = 3 (alive) (dead) x i kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 9 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 0 / 47 N y Reading data file N y data frame d data4a.csv CSV (comma separated value) format file R > d <- read.csv("data4a.csv") or > d <- read.csv( + "http://hosho.ees.hokudai.ac.jp/~kubo/stat/205/fig/binomial/data4a.csv") d data frame ( ) > summary(d) N y x f Min. :8 Min. :0.00 Min. : 7.660 C:50 st Qu.:8 st Qu.:3.00 st Qu.: 9.338 T:50 Median :8 Median :6.00 Median : 9.965 Mean :8 Mean :5.08 Mean : 9.967 3rd Qu.:8 3rd Qu.:8.00 3rd Qu.:0.770 Max. :8 Max. :8.00 Max. :2.440 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 2 / 47

kubostat207e p.3 N y binomial distribution logit link function > plot(d$x, d$y, pch = c(2, 9)[d$f]) > legend("topleft", legend = c("c", "T"), pch = c(2, 9)) yi 2. binomial distribution logit link function x i fertilization effective? kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 3 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 4 / 47 binomial distribution logit link function binomial distribution : N y p(y N, q) = ( ) N q y ( q) N y y ( N ) y N y p(y i 8, q) 0.0 0. 0.2 0.3 0.4 q = 0. q = 0.3 q = 0.8 y i kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 5 / 47 binomial distribution logit link function logistic curve (z i: q i = logistic(z i ) = linear predictor e.g. z i = β + β 2x i) + exp( z i ) > logistic <- function(z) / ( + exp(-z)) # > z <- seq(-6, 6, 0.) > plot(z, logistic(z), type = "l") q 0.0 0.2 0.4 0.6 0.8.0 q = +exp( z) -6-4 -2 0 2 4 6 z kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 6 / 47 binomial distribution logit link function β and β 2 change logistic curve logit link function binomial distribution logit link function {β, β 2 } = {0, 2} (A) β 2 = 2 β (B) β = 0 β 2 q 0.0 0.2 0.4 0.6 0.8.0 (A) β 2 = 2 β = 2 β = 0-3 -2-0 2 3 x β = 3 0.0 0.2 0.4 0.6 0.8.0 (B) β = 0 β 2 = 4 β 2 = 2-3 -2-0 2 3 x β 2 = {β, β 2 } x q 0 q kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 7 / 47 logistic q = + exp( (β + β 2 x)) = logistic(β + β 2 x) logit q logit(q) = log q = β + β 2 x logit logistic logistic logit logit is the inverse function of logistic function, vice versa kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 8 / 47

kubostat207e p.4 binomial distribution logit link function MLE for β and β 2 R β β 2 binomial distribution logit link function y (A) f i =C x (B) > glm(cbind(y, N - y) ~ x + f, data = d, family = binomial)... Coefficients: (Intercept) x ft -9.536.952 2.022 x yi (A) f i =C x i (B) f i =T x i kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 9 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 20 / 47? 3. q logit(q) = log q = β + β 2 x + β 3 f + β 4 xf... in case that β 4 < 0, sometimes it predicts... y T C 8 9 0 2 x kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 2 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 22 / 47 in today s example no interaction effect! offset ^^I glm(y ~ x + f,...) glm(y ~ x + f + x:f,...) (A) (B) 4.! y T C T C offset 8 9 0 2 x 8 9 0 2 x little difference kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 23 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 24 / 47

kubostat207e p.5! offset?! How to avoid data/data? offset / : 0 3 200 60 3? ( ) avoidable data/data values probability : N y indices such as densities use statistical model with binomial distribution : specific leaf area (SLA) use offset term! described later offset! kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 25 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 26 / 47! unfortunately, sometimes fractions appear... offset! offset example population densities in research plots offset : hard to avoid... outputs from some measuring machines light intensity index x light index {0., 0.2,,.0} 0 sometimes we have no choice but plot data/data values... glm(..., family = poisson) kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 27 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 28 / 47! offset What? Differences in plot size?!?!! R data.frame: Area, offset light index number of plants x, y x A = /! glm() offset > load("d2.rdata") > head(d, 8) # 8 Area x y 0.07249 0.5 0 2.27732 0.3 3 0.208422 0.4 0 4 2.256265 0. 0 5 0.79406 0.7 6 0.396763 0. 7.428059 0.6 8 0.79420 0.3 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 29 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 30 / 47

kubostat207e p.6! offset! offset vs A vs y > plot(d$x, d$y / d$area) > plot(d$area, d$y) d$y/d$area 0 5 0 5 d$y 0 5 0 5 0.2 0.4 0.6 0.8.0 d$x 0.0.0 2.0 3.0 d$area kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 3 / 47 A y kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 32 / 47! offset! offset x () x > plot(d$area, d$y, cex = d$x * 2) d$y 0 5 0 5 0.0.0 2.0 3.0 d$area? kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 33 / 47 y x kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 34 / 47! offset! offset = GLM!. i y i λ i : y i Pois(λ i ) 2. λ i A i x i λ i = A i exp(β + β 2 x i ) λ i = exp(β + β 2 x i + log(a i )) log(λ i ) = β + β 2 x i + log(a i ) log(a i ) offset ( β ) family: poisson, link "log" : y ~ x offset : log(area) z = β + β 2 x + log(area) a, b λ log(λ) = z λ = exp(z) = exp(β + β 2 x + log(area)) λ : kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 35 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 36 / 47

kubostat207e p.7! offset! offset glm() R glm() > fit <- glm(y ~ x, family = poisson(link = "log"), data = d, offset = log(area)) > print(summary(fit)) Call: glm(formula = y ~ x, family = poisson(link = "log"), data = d, offset = log(area)) (......) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 0.32 0.60 2.0 0.044 x.090 0.227 4.80.6e-06 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 37 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 38 / 47! offset Plotting the model prediction based on estimation! offset : glm() offset d$y 0 5 0 5 x = 0.9 light environment x = 0. dark environment offset = exp( ) d$y 0 5 0 5 0.0.0 2.0 3.0 d$area 0.0.0 2.0 3.0 d$area solid lines prediction glm() dotted lines true model kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 39 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 40 / 47! offset Improve your statisitcal model and remove data/data values! avoidable data/data values probability : N y indices such as densities use statistical model with binomial distribution : specific leaf area (SLA) use offset term! Improve your statistical model! offset! kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 4 / 47! offset The next topic 0 2 3 4 5 6 y i N y Hierarchical Bayesian Model (HBM)? kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 42 / 47

kubostat207e p.8! offset! offset A preview of continuous probability distributions to construct Hierarchical Bayesian Models kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 43 / 47? discrete probability distributions?? continuous probability distributions? kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 44 / 47! offset! offset discrete probability distributions ( ) Poisson distribution λ changes the shape of distribution λ probability distribution, the core of statistical model Binomial distribution binomial distribution logit link function binomial distribution : N y Uniform distribution (continuous) an important device for HBM parameter: min (a) and max (b) p(y λ) = λy exp( λ) mean λ y! ( ) N p(y N, q) = q y ( q) N y y! ) N y ( N y q = 0. q = 0.8 q = 0.3 p(yi 8, q) yi 0.0 0. 0.2 0.3 0.4 f (x) b a kubostat207b (http://goo.gl/76c4i) 207 (b) 207 06 2 29 / 42 kubostat207e (http://goo.gl/76c4i) 207 (e) 207 06 2 5 / 47 0 a b x kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 45 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 46 / 47! offset the normal or Gaussian distribution an important device for HBM parameter: mean (µ) and SD (s > 0) (mean) µ = 0 Standard Deviation (SD) s s =.0 s =.5 s = 3.0 x ( ) p(x s) = exp x2 2πs 2 2s 2 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 47 / 47