講義のーと : データ解析のための統計モデリング. 第5回

Similar documents
k3 ( :07 ) 2 (A) k = 1 (B) k = 7 y x x 1 (k2)?? x y (A) GLM (k

講義のーと : データ解析のための統計モデリング. 第3回

講義のーと : データ解析のための統計モデリング. 第2回

k2 ( :35 ) ( k2) (GLM) web web 1 :

kubostat2018d p.2 :? bod size x and fertilization f change seed number? : a statistical model for this example? i response variable seed number : { i

一般化線形 (混合) モデル (2) - ロジスティック回帰と GLMM

kubostat2017c p (c) Poisson regression, a generalized linear model (GLM) : :

kubostat2015e p.2 how to specify Poisson regression model, a GLM GLM how to specify model, a GLM GLM logistic probability distribution Poisson distrib

kubostat2017e p.1 I 2017 (e) GLM logistic regression : : :02 1 N y count data or

-99-

1 15 R Part : website:

(2/24) : 1. R R R






















~::: :,/ ~7;

~ 料 ii



-17-







60 (W30)? 1. ( ) 2. ( ) web site URL ( :41 ) 1/ 77










/ 60 : 1. GLM? 2. A: (pwer functin) x y?




~:::

















12/1 ( ) GLM, R MCMC, WinBUGS 12/2 ( ) WinBUGS WinBUGS 12/2 ( ) : 12/3 ( ) :? ( :51 ) 2/ 71

























Transcription:

Title 講義のーと : データ解析のための統計モデリング Author(s) 久保, 拓弥 Issue Date 2008 Doc URL http://hdl.handle.net/2115/49477 Type learningobject Note この講義資料は, 著者のホームページ http://hosho.ees.hokudai.ac.jp/~kub ードできます Note(URL)http://hosho.ees.hokudai.ac.jp/~kubo/ce/EesLecture20 Additional There Information are other files related to this item in HUSCAP File Information kubostat2008e.pdf ( 第 5 回 ) Instructions for use Hokkaido University Collection of Scholarly and Aca

2008 11 13 (2012-07-01 10:11 ) 1 (2008 10-11 ) 5 (+2) 5 (2008 11 13) kubo@ees.hokudai.ac.jp http://hosho.ees.hokudai.ac.jp/~kubo/ce/eeslecture2008.html! http://hosho.ees.hokudai.ac.jp/~kubo/ce/iwanamibook.html 1. 2 2. 5 3. Deviance 6 4. 7 5. 8 6. (1) parametric bootstrap 10 7. (2) χ 2 14 8.? 15 9. 17 10. : 18 5 ( ) (GLM)

2008 11 13 (2012-07-01 10:11 ) 2 GLM () link ; (deviance) ( ) (AIC )? : : ( ) : discussion : : ( ) : : () 2008 11 13 3/ 7 2008 11 13 4/ 7

2008 11 13 (2012-07-01 10:11 ) 3? ( ) 1. 3 GLM () (data3a.csv ) f i 1 i y i 1. 3 f i y i x i 1: i (i = 1, 2,, 100) CSV read.csv() 2 > d <- read.csv("data3a.csv") > head(d) y x f 1 6 8.31 C 2 6 9.44 C 3 6 9.50 C 4 12 9.07 C 5 10 10.16 C 6 4 8.32 C 3 3 2. f 3. 3

2008 11 13 (2012-07-01 10:11 ) 4 > (fit2 <- glm(y ~ x, data = d, family = poisson))...( )... Coefficients: (Intercept) x 1.29172 0.07566...( )... Residual Deviance: 84.99 AIC: 474.8 i λ i = exp(β 1 + β 2 x i ) ( 2 2 ) λ i = exp(β 1 ) ( 1 1 ) > (fit1 <- glm(y ~ 1, data = d, family = poisson))...( )... Coefficients: (Intercept) 2.058...( )... Residual Deviance: 89.51 AIC: 477.3 > plot(d$x, d$y) # ( ) > xx <- seq(min(d$x), max(d$x), length = 50) # x > abline(h = mean(d$y), lty = 2, lwd = 2) # model 1 > lines(xx, exp(1.29 + 0.0757 * xx)) # model 2 d$y 2 4 6 8 10 14 7 8 9 10 11 12 d$x 1 ( x i ) 2 ( x i ) 4 4. 1 exp(2.058) = 1 2 mean(d$y) = 7.83

2008 11 13 (2012-07-01 10:11 ) 5 Model k log L Deviance Residual 2 log L AIC deviance Model 1 1-237.6 475.3 89.5 477.3 Model 2 2-235.4 470.8 85.0 474.8 FULL 100-192.9 385.8 0.0 585.8 AIC (Akaike s information criterion) AIC = 2 ( ) + 2 ( ) = (Deviance) + 2 ( ) = 2 log L + 2k 1 2 5 1 2 deviance 4.5 6 5. AIC AIC c AIC c 2. ( (deviance) ( ) ) 7? 6. 2 log D residual deviance 7. 8 8. () () ()

2008 11 13 (2012-07-01 10:11 ) 6 1 ( 1 x i ) 2 ( 2 x i ) ( 1 2) : 1 2 (AIC ) (!) :? 1 vs 2 9 1 2 2 x i ( i λ i x i ) 9. x i 1 ( 1) (null hypothesis) 2 ( 2) (alternative hypothesis) 1 ( ) 2 deviance ( ) 1 475.3

2008 11 13 (2012-07-01 10:11 ) 7 2 470.8 X deviance 10 deviance AIC deviance 2 (penalty) 10. 3. Deviance? 1 2 (deviance ) deviance 11 deviance (likelihood ratio test) 11. : 1 ( ) vs 2 ( ) 12 (Neyman-Pearson ) 12. Deviance 2 1 2 2 {log( 1 ) log( 2 )} deviance deviance

2008 11 13 (2012-07-01 10:11 ) 8 4. deviance 1 ( ) 2 ( ) deviance 4.5 deviance A: () 2 B: 1 2 1 A B 1,2 deviance ( ) () OK OK 1 2 ( B) 1 ( A) 13 13. 1 : deviance 4.5 2 (type I error) 2 : deviance 4.5 2 1 (type II error)

2008 11 13 (2012-07-01 10:11 ) 9? 14 14. ( ) x i! 5.? 1 vs 2 15 15. 1. 1 deviance 2. 1 3. 2 deviance 4. 1 2 deviance 16 4.5 16. deviance () 5. deviance 4.5 P P (P value) 17 P P : deviance 4.5 P : deviance 4.5 2! 17. P value?

2008 11 13 (2012-07-01 10:11 ) 10 P α : P < α : 1 x i 2 α α? 0.05 5% 18 18. 20 α = 0.05 α Neyman-Pearson 19 19. Neyman-Pearson 1 2 deviance 4.5? 1 deviance 4.5 () P? 6. (1) parametric bootstrap 1 2 deviance 4.5 1. P parametric bootstrap 2. (deviance χ 2 ) parametric bootstrap deviance R glm() fit1 fit2? 2 fit2

2008 11 13 (2012-07-01 10:11 ) 11 > (fit2 <- glm(y ~ x, data = d, family = poisson))...( )... Residual Deviance: 84.99 AIC: 474.8 (residual) deviance 20 fit2 > sort(names(fit2)) [1] "R" "aic" "boundary" [4] "call" "coefficients" "contrasts" [7] "control" "converged" "data" [10] "deviance" "df.null" "df.residual"... ( )... > fit2$deviance [1] 84.993 2 residual deviance 1 2 deviance > fit1$deviance - fit2$deviance [1] 4.513941 deviance 4.51 1. 1 2. 1 3. 2 deviance rpois() 1 > d$y.rnd <- rpois(100, lambda = mean(d$y)) 1 20. 1 fit1

2008 11 13 (2012-07-01 10:11 ) 12 1 1 21 1 glm() 1 2 > fit1 <- glm(y.rnd ~ 1, data = d, family = poisson) > fit2 <- glm(y.rnd ~ x, data = d, family = poisson) > fit1$deviance - fit2$deviance [1] 1.920331 x i 1 1 deviance ( ) 2 2 deviance ( ) 21. glm() coefficient exp(2.05) deviance 1.92 1. mean(d$y)a d$y.rnd 2. d$y.rnd 1, 2 GLM fit1, fit2 3. deviance fit1$deviance - fit2$deviance 1 deviance parametric bootstrap 1 22 1000 22. bootstrap deviance parametric bootstrap pb() 23 23. pb() pb.r

2008 11 13 (2012-07-01 10:11 ) 13 pb <- function(d, n.bootstrap) { n.sample <- nrow(d) # y.mean <- mean(d$y) # v.d.dev12 <- sapply( # PB deviance 1:n.bootstrap, function(i) { d$y.rnd <- rpois(n.sample, lambda = y.mean) fit1 <- glm(y.rnd ~ 1, data = d, family = poisson) fit2 <- glm(y.rnd ~ x, data = d, family = poisson) fit1$deviance - fit2$deviance # deviance } ) v.d.dev12 # deviance vector } # : fit1 # fit2$null.deviance - fit2$deviance # deviance ( fit1, fit2 ) web page pb.r pb.r R 24 source("pb.r")24. R R pb() R working directory bootstrap 1000 > source("pb.r") > diff.dev12 <- pb(d, n.bootstrap = 1000) #......( )... GLM deviance 1000 diff.dev12 vector > summary(diff.dev12) Min. 1st Qu. Median Mean 3rd Qu. Max. 7.229e-08 8.879e-02 4.752e-01 1.025e+00 1.339e+00 1.987e+01 19.9 deviance deviance 4.51

2008 11 13 (2012-07-01 10:11 ) 14 > hist(diff.dev12, 100) > abline(v = 4.51, lty = 2) Histogram of diff.dev12 Frequency 0 100 200 300 0 5 10 15 20 diff.dev12 1000 deviance 4.51? > sum(diff.dev12 >= 4.51) [1] 38 1000 38 4.51 deviance 4.51 38 / 1000 P = 0.038 P = 0.05 deviance > quantile(diff.dev12, 0.95) 95% 3.953957 3.95 25 deviance 4.51 0.038 26 0.05 (significantly different) 27 ( 1 1) 2 parametric bootstrap 28 P 25. 5% 26. 27. significant 28. deviance

2008 11 13 (2012-07-01 10:11 ) 15 parametric bootstrap ; R 7. (2) χ 2 parametric bootstrap 29 29. deviance R : 30 30. anova() ANOVA > fit1 <- glm(y ~ 1, data = d, family = poisson) # 1 > fit2 <- glm(y ~ x, data = d, family = poisson) # 2 analysis of variance deviance ( > anova(fit1, fit2, test = "Chisq") # ) Analysis of Deviance Table ANOVA analysis of deviance Model 1: y ~ 1 Model 2: y ~ x Resid. Df Resid. Dev Df Deviance P(> Chi ) 1 99 89.507 2 98 84.993 1 4.514 0.034 parametric bootstrap 1 2 deviance ( 1 ) pb() diff.dev12 vector (diff.dev12 ) deviance 1 31 χ 2 32 χ 2 31. 1 2 ( "Chisq" ) deviance 4.51 P 0.034 32.

2008 11 13 (2012-07-01 10:11 ) 16 parametric bootstrap deviance 4.51 0.038 8.? 1 ( ) 2 ( ) 1 P 1,2 deviance ( ) () OK OK = 1 ( ) 2008 11 13 6/ 7 α 0.05 33 P < α P < α : 1 x i 2 33. α P α? P α : 2 1 34! 34. 1 2 35 35. P < α P α P α

2008 11 13 (2012-07-01 10:11 ) 17 β () β Neyman-Pearson P 36 P 1. 2. (?) 37 3. P < α : deviance ( ) χ 2 χ 2 table P = 0.05 deviance? 36. 37. 3. 38 38. 1990 39 39. 9.? ( )?

2008 11 13 (2012-07-01 10:11 ) 18 : 40 P P 0.05 P 40. ( ) ( ; power) 41 : 41. 1 β (experimental design) 42 43 44 45 x i () x i 42. 43. 44. 45.

2008 11 13 (2012-07-01 10:11 ) 19 46 46. 10. : GLM GLM / random effects random effects (generalized linear mixed model; GLMM) 47 47. GLMM GLMM? ( )? ( ) random effects

2008 11 13 (2012-07-01 10:11 ) 20 Bayes Bayes MCMC random effects (GLMM) (GLM) +, fixed effects + 2008 11 13 7/ 7