2012 11 01 k3 (2012-10-24 14:07 ) 1 6 3 (2012 11 01 k3) kubo@ees.hokudai.ac.jp web http://goo.gl/wijx2 web http://goo.gl/ufq2 1 3 2 : 4 3 AIC 6 4 7 5 8 6 : 9 7 11 8 12 8.1 (1)........ 13 8.2 (2) χ 2.................... 16 9 17 10 18
2012 11 01 k3 (2012-10-24 14:07 ) 2 (A) k = 1 (B) k = 7 y 2 4 6 8 10 12 14 2 4 6 8 10 12 14 7 8 9 10 11 12 x 7 8 9 10 11 12 x 1 (k2)?? x y (A) GLM (k = 1) (B) x 6 GLM (k = 7) 1 1 2 GLM 1 (A) log λ = β 1, k = 1 1 (B) x 6 (log λ = β 1 + β 2 x + + β 7 x 6, k = 7) model selection AIC AIC AIC AIC GLM AIC 1 2 R 2 (k2) 100
2012 11 01 k3 (2012-10-24 14:07 ) 3 (A) k = 1 6 7 8 9 10 (B) f k = 2 6 7 8 9 10 6 7 8 9 10 7 8 9 10 11 12 (C) x k = 2 7 8 9 10 11 12 6 7 8 9 10 7 8 9 10 11 12 (D) x + f k = 3 7 8 9 10 11 12 2 4 x λ k (A) (B) f : (C) x : (D) x + f : 1 100 y i?? R glm() 2 x i x ; 2 C;?? f i f ; 2 B;?? x + f ; 2 D;?? 3 2 ( ; 2 A 3 ; 2 ) λ exp β 1 β 1 4 maximum log likelihood 3 1 (A)
2012 11 01 k3 (2012-10-24 14:07 ) 4 2 : deviance 4 1 R glm() GLM log L({β j }) log L log L log L D = 2 log L log L -2 5 λ i x i λ i = exp(β 1 + β 2 x i ) x 2 C x log L -235.4 (D = 2 log L ) 470.8 1 log L D 2 log L D D D Null D Null D D glm()... Null Deviance: 89.51 Residual Deviance: 84.99 AIC: 474.8 470.8 Null Deviance, Residual Deviance, AIC 3 residual deviance D ( ) R full model 100 100 4 5 deviance -2 χ 2
2012 11 01 k3 (2012-10-24 14:07 ) 5 475.3 470.8 x Deviance 2 log L () 89.5 (Null Deviance) 85.0 (Residual Deviance) 385.8 3 deviance deviance (null deviance) (residual deviance) y i y i = {6, 6, 6, 12, 10, } 100 i {1, 2, 3} y i 6 {λ 1, λ 2, λ 3 } = {6, 6, 6} i = 4 y 4 12 λ 4 = 12 i = 5 y 5 10 λ 5 = 10...... 100 log L 6 > sum(log(dpois(d$y, lambda = d$y))) [1] -192.8898 D = 2 log L = 385.8 100 x D ( D) = 470.8 385.8 = 85.0 glm() Residual Deviance: 84.99 AIC: 474.8 Residual Deviance 385.8 6 R dpois() 100 {y i } {λ i } = {y 1, y 2, y 3, } i log L = 0
2012 11 01 k3 (2012-10-24 14:07 ) 6 3 7 λ i = exp(β 1 ) β 1 k = 1 R null model 8 log λ i = β 1 2 A R glm() glm(y ~ 1,...) > fit.null <- glm(formula = y ~ 1, family = poisson, data = d) fit.null β 1 2.06 9 Degrees of Freedom: 99 Total (i.e. Null); Null Deviance: 89.51 Residual Deviance: 89.51 AIC: 477.3 99 Residual 89.5 > loglik(fit.null) log Lik. -237.6432 (df=1) 475.3 D 385.8 89.5 k log L D = 2 log L 2 log L D 2 k 10 3 AIC 2 11 7 GLM 8 null hypothesis 9 1 (A) 10 f 2 11??
2012 11 01 k3 (2012-10-24 14:07 ) 7 2 k log L Deviance Residual deviance 2 k log L Deviance Residual 2 log L deviance 1-237.6 475.3 89.5 f 2-237.6 475.3 89.5 x 2-235.4 470.8 85.0 x + f 3-235.3 470.6 84.8 100-192.9 385.8 0.0 3 2 AIC k log L Deviance Residual 2 log L deviance AIC 1-237.6 475.3 89.5 477.3 f 2-237.6 475.3 89.5 479.3 x 2-235.4 470.8 85.0 474.8 x + f 3-235.3 470.6 84.8 476.6 100-192.9 385.8 0.0 585.8 model selection model selection criterion AIC (Akaike s information criterion) AIC goodness of fitgoodness of prediction 12 k AIC AIC = 2 { } = 2(log L k) = D + 2k AIC 2 AIC 3 x AIC 13 AIC 4 statistical test 12 13?? GLM R stepaic()
2012 11 01 k3 (2012-10-24 14:07 ) 8 14 AIC likelihood ratio test 15 16 17 parametric 18 19 5 4 20 21 null hypothesis alternative hypothesis 14 15?? 16 17 most powerful test 18 19 20 10 21 AIC
2012 11 01 k3 (2012-10-24 14:07 ) 9 AIC ( ) ( ) 4 AIC 4 test statistic 22 95% 5% significant level 23 Neyman-Pearson 24 6 : 5 λ = exp(β 1 + β 2 x i ) GLM x : λ i x i β 2 = 0; k = 1 x : λ i x i β 2 0; k = 2 22 23 24 Neyman-Pearson
2012 11 01 k3 (2012-10-24 14:07 ) 10 (A) (B) i y i x i yi 2 4 6 8 10 12 14 x = 470.8 = 475.3 7 8 9 10 11 12 x i 5 (A) f i (B) 100 x x i ; x x i 4 x AIC 3 k log L Deviance Residual 2 log L deviance AIC 1-237.6 475.3 89.5 477.3 x 2-235.4 470.8 85.0 474.8 100-192.9 385.8 0.0 585.8 5 4 25 475.3 x 470.8 4.5 26 likelihood ratio : L 1 L 2 = : exp( 237.6) x : exp( 235.4) -2 D 1,2 = 2 (log L 1 log L 2) 27 D 1 = 2 log L 1 D 2 = 2 log L 2 D 1,2 = D 1 D 2 D 1,2 x x D 1,2 = 4.5 x 4.5 4.5 25 26 27 2 (A) (C) 3-2 D 1,2 χ 2 8.2
2012 11 01 k3 (2012-10-24 14:07 ) 11 5 D 1,2 ( ) ( ) () () 7 5 Neyman-Pearson : k = 1, β 2 = 0 : x k = 2, β 2 0 28 Neyman-Pearson 9 5 : D 1,2 = 4.5 x β 2 0 type I error : x D 1,2 = 4.5 x β 2 = 0 type II error Neyman-Pearson 1 2 ˆβ 1 = 2.06 (p.6 ) 3 β 2 = 0(k = 1) β 2 0(k = 2) D 1,2 D 1,2 6 28 x alternative hypothesis
2012 11 01 k3 (2012-10-24 14:07 ) 12 ( ˆβ 1 = 2.06 ) x D 1,2 D 1,2 D 1,2 D 1,2 7 9 11 13 x 7 9 11 13 7 9 11 13 7 9 11 13 6 D 1,2 ˆβ 1 = 2.06, p.6 D 1,2 4 x D 1,2 4.5 P D 1,2 = 4.5 29 8 x D 1,2 4.5 P P P value P P : D 1,2 = 4.5 P : D 1,2 = 4.5 x! P Neyman-Pearson α 30 : P α : P < α : α α = 0.0520 1 31 29 9 30 P α 31 20 1
2012 11 01 k3 (2012-10-24 14:07 ) 13 8.1 (1) P D 1,2 4.5 P P parametric bootstrap 32 χ 2 (PB) 6 R glm() x fit1 fit2 fit1 fit2 > fit2$deviance [1] 84.993 x x D 1,2 > fit1$deviance - fit2$deviance [1] 4.5139 D 1,2 4.5 7.85 33 7.85 100 rpois() 100 > d$y.rnd <- rpois(100, lambda = mean(d$y)) mean(d$y) 7.85 glm() x 32 33 glm() ˆβ 1 = 2.06 exp(2.06) = 7.846 mean(d$y)
2012 11 01 k3 (2012-10-24 14:07 ) 14 > fit1 <- glm(y.rnd ~ 1, data = d, family = poisson) > fit2 <- glm(y.rnd ~ x, data = d, family = poisson) > fit1$deviance - fit2$deviance [1] 1.920331 x i 1.92 x : 1 mean(d$y) d$y.rnd 2 d$y.rnd,x glm() fit1, fit2 3 fit1$deviance - fit2$deviance PB 1 1000 D 1,2 34 PB R pb() 35 get.dd <- function(d) # { n.sample <- nrow(d) # y.mean <- mean(d$y) # d$y.rnd <- rpois(n.sample, lambda = y.mean) fit1 <- glm(y.rnd ~ 1, data = d, family = poisson) fit2 <- glm(y.rnd ~ x, data = d, family = poisson) fit1$deviance - fit2$deviance # } pb <- function(d, n.bootstrap) { sapply(1:n.bootstrap, get.dd, d) } pb.r 36 R R pb.r pb() 34 35 36 bootstrap method fit1 fit2$null.deviance - fit2$deviance D 1,2 web ( )
2012 11 01 k3 (2012-10-24 14:07 ) 15 0 50 150 250 350 D 1,2 = 4.5 0 5 10 15 20 x D 1,2 7 D 1,2 D 1,2 1000 x D 1,2 = 4.5 > source("pb.r") # pb.r > dd12 <- pb(d, n.bootstrap = 1000) R D 1,2 1000 37 dd12 summary() > summary(dd12) Min. 1st Qu. Median Mean 3rd Qu. Max. 7.229e-08 8.879e-02 4.752e-01 1.025e+00 1.339e+00 1.987e+01 7 D 1,2 4.5 > hist(dd12, 100) > abline(v = 4.5, lty = 2) 1000 D 1,2 4.5 > sum(dd12 >= 4.5) [1] 38 1000 38 4.5 4.5 38 / 1000 P = 0.038 P = 0.05 D 1,2 38 37 D 1,2 10 3 n.bootstrap 10 4 38 P = α D 1,2 (critical point) D 1,2 (critical region rejection region)
> quantile(dd12, 0.95) 95% 3.953957 2012 11 01 k3 (2012-10-24 14:07 ) 16 5% D 1,2 3.95 4.5 P 0.038 39 0.05 significantly different 40 x 8.2 (2) χ 2 PB 7 41 fit1 fit2 x > fit1 <- glm(y ~ 1, data = d, family = poisson) > fit2 <- glm(y ~ x, data = d, family = poisson) anova() 42 > anova(fit1, fit2, test = "Chisq") Analysis of Deviance Table Model 1: y ~ 1 Model 2: y ~ x Resid. Df Resid. Dev Df Deviance P(> Chi ) 1 99 89.507 2 98 84.993 1 4.54 0.034 D 1,2 43 1 χ 2 χ 2 distribution 39?? 40 P Neyman-Pearson P < α 41 42 anova() ANOVA analysis of variance analysis of deviance 43 x
2012 11 01 k3 (2012-10-24 14:07 ) 17 "Chisq" χ 2 D 1,2 4.5 P 0.034 P PB P = 0.038 χ 2 100 "Chisq" P PB 44 χ 2 t F 9 α = 0.05 D 1,2 P < α P α fail to reject Neyman-Pearson 45 Neyman-Pearson 7 P < α P α 5 P 2 46 Neyman-Pearson P P 2 P 2 1 P 2 ; power 47 44 PB 45 46 β β 1 P 2 47
2012 11 01 k3 (2012-10-24 14:07 ) 18 10 AIC AIC Neyman-Pearson AIC 1.000001 P < α AIC P 48 48 effect size