講義のーと : データ解析のための統計モデリング. 第3回

Similar documents
k2 ( :35 ) ( k2) (GLM) web web 1 :

講義のーと : データ解析のための統計モデリング. 第5回

kubostat2017c p (c) Poisson regression, a generalized linear model (GLM) : :

一般化線形 (混合) モデル (2) - ロジスティック回帰と GLMM

講義のーと : データ解析のための統計モデリング. 第2回

k3 ( :07 ) 2 (A) k = 1 (B) k = 7 y x x 1 (k2)?? x y (A) GLM (k

kubostat2015e p.2 how to specify Poisson regression model, a GLM GLM how to specify model, a GLM GLM logistic probability distribution Poisson distrib

kubostat2017e p.1 I 2017 (e) GLM logistic regression : : :02 1 N y count data or

(2/24) : 1. R R R

1 15 R Part : website:

kubo2017sep16a p.1 ( 1 ) : : :55 kubo ( ( 1 ) / 10

kubostat2018d p.2 :? bod size x and fertilization f change seed number? : a statistical model for this example? i response variable seed number : { i

/ 60 : 1. GLM? 2. A: (pwer functin) x y?

Use R

(lm) lm AIC 2 / 1

kubostat7f p GLM! logistic regression as usual? N? GLM GLM doesn t work! GLM!! probabilit distribution binomial distribution : : β + β x i link functi

60 (W30)? 1. ( ) 2. ( ) web site URL ( :41 ) 1/ 77

/ 55 2 : : (GLM) 1. 1/23 ( )? GLM? (GLM ) 2.! 1/25 ( ) ffset (GLM )

12/1 ( ) GLM, R MCMC, WinBUGS 12/2 ( ) WinBUGS WinBUGS 12/2 ( ) : 12/3 ( ) :? ( :51 ) 2/ 71

kubostat2017j p.2 CSV CSV (!) d2.csv d2.csv,, 286,0,A 85,0,B 378,1,A 148,1,B ( :27 ) 10/ 51 kubostat2017j (

kubostat2017b p.1 agenda I 2017 (b) probability distribution and maximum likelihood estimation :

-99-

,, Poisson 3 3. t t y,, y n Nµ, σ 2 y i µ + ɛ i ɛ i N0, σ 2 E[y i ] µ * i y i x i y i α + βx i + ɛ i ɛ i N0, σ 2, α, β *3 y i E[y i ] α + βx i

DAA09











~:::






2 と入力すると以下のようになる > x1<-c(1.52,2,3.01,9,2,6.3,5,11.2) > y1<-c(4,0.21,-1.5,8,2,6,9.915,5.2) > cor(x1,y1) [1] > cor.test(x1,y1) Pearson's produ






~::: :,/ ~7;



-17-
































~ 料 ii






untitled


インターネットを活用した経済分析 - フリーソフト Rを使おう



BMIdata.txt DT DT <- read.table("bmidata.txt") DT head(dt) names(dt) str(dt)








Transcription:

Title 講義のーと : データ解析のための統計モデリング Author(s) 久保, 拓弥 Issue Date 2008 Doc URL http://hdl.handle.net/2115/49477 Type learningobject Note この講義資料は, 著者のホームページ http://hosho.ees.hokudai.ac.jp/~kub ードできます Note(URL)http://hosho.ees.hokudai.ac.jp/~kubo/ce/EesLecture20 Additional There Information are other files related to this item in HUSCAP File Information kubostat2008c.pdf ( 第 3 回 ) Instructions for use Hokkaido University Collection of Scholarly and Aca

2008 11 06 (2012-07-01 10:11 ) 1 (2008 10-11 ) 5 (+2) 3 (2008 11 06) (GLM) 1 kubo@ees.hokudai.ac.jp http://hosho.ees.hokudai.ac.jp/~kubo/ce/eeslecture2008.html! http://hosho.ees.hokudai.ac.jp/~kubo/ce/iwanamibook.html 1. (GLM)? 1 2. GLM : link 4 3. GLM : 5 4. : glm(..., family = poisson) 9 5. 14 6. + 16 7. Deviance? 17 8. 21 9. 22 3 λ λ λ = ( β 1 + β 2 ( 1) + β 3 ( 2) + ) {β j } = {β 1, β 2, }

2008 11 06 (2012-07-01 10:11 ) 2 1. (GLM)? 1 1 Bayes Bayes MCMC random effects (GLMM) (GLM) +, fixed effects + 1. 20 2008 11 06 5/ 8 1: ( ) (general linear model) ANOVA ANCOVA = 2 2.? general linear model

2008 11 06 (2012-07-01 10:11 ) 3 ( ) 2 3 3. 2? 0, 1, 2, 3,... (1) ( )? (2)? 2: ( ) ( 3) (generalized linear model; GLM) 4 : 5 link 4. 5. ( : ) GLM 6 6.

2008 11 06 (2012-07-01 10:11 ) 4 3: ( ) () 7 logistic 8 9 2. GLM : link GLM (probablistic distribution) link (link function) (linear predictor) GLM R glm() 7. 8. + logistic 9.? ( ) rbinom() glm(family = binomial) rbinom() glm(family = binomial) rpois() glm(family = poisson) rnbinom() glm.nb() ( ) rgamma() glm(family = gamma) rnorm() glm(family = gaussian)

2008 11 06 (2012-07-01 10:11 ) 5 link (link function) ( ) µ µ = β 1 + β 2 ( 1) + β 3 ( 2) + {β j } (explanatory variable) β j (1 ) j z = j β j (1 ) z GLM 10 10. link (response variable) 11 11. µ = z 12 log link log µ = z µ = exp ( β 1 + β 2 ( 1) + β 3 ( 2) + ) ( ) 12. R glm() identity link link GLM link 13 13. canonical link function (R glm() family ) canonical ( ) 14 link glm() family link (m ) 14. ( ) binomial logit µ(1 µ) ( ): Np(1 p) N binomial logit µ(1 µ) ( ) p poisson log µ ( ) gamma log? µ 2 gaussian identity

2008 11 06 (2012-07-01 10:11 ) 6 3. GLM : 4 GLM f i i C: T: y i x i 4: i (i = 1, 2,, 100) ( ) i (i = 1, 2, 3,, 100, 100 ) y i ; 15 x i ; 50 (i = 1, 2,, 50) (C ) 50 (i = 51, 52,, 100) (T ) spread sheet 16 CSV (comma separated values) 17 data3a.csv 18 R 19 > d <- read.csv("data3a.csv") data.frame d 20 data.frame table R? d ENTER 15. 16. ( ) 17. CSV 18. web page 19. R... R setwd() data3a.csv 20.

y x f 1 6 8.31 C 2 6 9.44 C 3 6 9.50 C... ( )... 99 7 10.86 T 100 9 9.97 T 2008 11 06 (2012-07-01 10:11 ) 7 100 21 21. head(d) 6 100 100 data.frame head(d, 10) y x f 10 R d x y > d$x [1] 8.31 9.44 9.50 9.07 10.16 8.32 10.61 10.06 [9] 9.93 10.43 10.36 10.15 10.92 8.85 9.42 11.11... ( )... [97] 8.52 10.24 10.86 9.97 > d$y [1] 6 6 6 12 10 4 9 9 9 11 6 10 6 10 11 8 [17] 3 8 5 5 4 11 5 10 6 6 7 9 3 10 2 9... ( )... [97] 6 8 7 9 f > d$f [1] C C C C C C C C C C C C C C C C C C C C C C C C C [26] C C C C C C C C C C C C C C C C C C C C C C C C C [51] T T T T T T T T T T T T T T T T T T T T T T T T T [76] T T T T T T T T T T T T T T T T T T T T T T T T T Levels: C T (factor) 22 22. R read.csv() CSV table C T 23 factor f C T 2 23. (levels) () factor C, T read.csv() 24 R class() ( 24.

2008 11 06 (2012-07-01 10:11 ) 8 ) 25 > class(d) # d data.frame [1] "data.frame" > class(d$y) # y integer [1] "integer" > class(d$x) # x numeric [1] "numeric" > class(d$f) # f factor [1] "factor" R summary() d data.frame > summary(d) y x f Min. : 2.00 Min. : 7.190 C:50 1st Qu.: 6.00 1st Qu.: 9.428 T:50 Median : 8.00 Median :10.155 Mean : 7.83 Mean :10.089 3rd Qu.:10.00 3rd Qu.:10.685 Max. :15.00 Max. :12.400 25. R # data.frame summary 26 26. f C 50 T 50 summary plot() factor vector summary() x y! (scatter plot) > plot(d$x, d$y, pch = c(21, 19)[d$f]) > legend("topleft", legend = c("c", "T"), pch = c(21, 19)) d$y 2 4 6 8 10 14 C T 7 8 9 10 11 12 d$x f

2008 11 06 (2012-07-01 10:11 ) 9 > plot(d$f, d$y) 2 4 6 8 10 14 C T R plot() 27 (box-whisker 27. 75%, 50%, 25% plot) ( ) 2.5% ( ) 95% R plot() summary() 28 28. 4. : glm(..., family = poisson) d y x f ( ) i y i 0, 1, 2 ( ) i λ i = exp(β 1 + β 2 x i ) x i f i 29 29. f i

2008 11 06 (2012-07-01 10:11 ) 10 log λ i = β 1 + β 2 x i link log link z = β 1 + β 2 x i log link? λ i λ i > 0 λ i = exp(z i ) z (= β 1 + β 2 x i ) exp() z λ i 30 30. λ i β 1, β 2, x i > xi <- seq(-4, 5, 0.1) λ i 0 > plot(xi, exp(-1 + 0.4 * xi), type = "l", ylab = "lambda", lwd = 2) > lines(xi, exp(-2-0.8 * xi), lwd = 2, lty = 2) # > abline(v = 0, lwd = 2, lty = 3) lambda 0.0 1.0 2.0 4 2 0 2 4 xi λ i = exp(β 1 + β 2 x i ) i y i p(y i {β j }, {x i }) = λy i i exp( λ i ) y i! L({β j } {y i }, {x i }) = 100 i=1 λ y i i exp( λ i ) y i! 100 log L({β j } {y i }, {x i }) = i=1 { λ y i log i exp( λ i ) y i! }

2008 11 06 (2012-07-01 10:11 ) 11 ˆβ 1 ˆβ 2 (GLM) R GLM > fit <- glm(y ~ x, data = d, family = poisson) (fitting) 31 ( 31. ) fit names(fit) 5: glm() 5 family = poisson log link fit > fit # print(fit) Call: glm(formula = y ~ x, family = poisson, data = d) Coefficients: (Intercept) x 1.29172 0.07566 family = poisson(link = "log") link poisson family default link function 32 "log" 32. canonical link function ( link ) Degrees of Freedom: 99 Total (i.e. Null); 98 Residual Null Deviance: 89.51 Residual Deviance: 84.99 AIC: 474.8

2008 11 06 (2012-07-01 10:11 ) 12 summary() > summary(fit) Call: glm(formula = y ~ x, family = poisson, data = d) Deviance Residuals: Min 1Q Median 3Q Max -2.3679-0.7348-0.1775 0.6987 2.3760 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 1.29172 0.36369 3.552 0.000383 *** x 0.07566 0.03560 2.125 0.033580 * --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 89.507 on 99 degrees of freedom Residual deviance: 84.993 on 98 degrees of freedom AIC: 474.77 Number of Fisher Scoring iterations: 4 Deviance summary() (coefficient) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 1.29172 0.36369 3.552 0.000383 *** x 0.07566 0.03560 2.125 0.033580 * Coefficients ( ) (Intercept) ( ) ( ) β 1 x β 2 Estimate ˆβ 1 1.29 ˆβ 2 0.0757 Std. Errror?

2008 11 06 (2012-07-01 10:11 ) 13 ˆβ 1 ˆβ 2 ˆβ 1 33 33.? (SE) z value 34 ( ) / SE (Wald ) 6 ˆβ ˆβ 1 SE ˆβ 2 SE ˆβ 2 ˆβ 1 0.0 0.5 1.0 1.5 6:? 34. z Wald 35 Pr(> z ) z value 35. (Neyman- ˆβ Pearson ) 36 ( ) Wald R summary(glm(...)) 37 (prediction) R 36. 37.

2008 11 06 (2012-07-01 10:11 ) 14 > plot(d$x, d$y, pch = c(21, 19)[d$f]) > xx <- seq(min(d$x), max(d$x), length = 100) > lines(xx, exp(1.29 + 0.0757 * xx), lwd = 2) d$y 2 4 6 8 10 14 7 8 9 10 11 12 d$x lines(xx, exp(1.29 + 0.0757 * xx), lwd = 2) λ > yy <- predict(fit, newdata = data.frame(x = xx), type = "response") > lines(xx, yy, lwd = 2) predict() log L({β j } ) { ˆβ i } { ˆβ i } log L({β j } ) loglik(fit) > loglik(fit) # fit <- glm(y ~ x,...) log Lik. -235.3863 (df=2) -235.4 (df=2) β 1 β 2 2 5. x i f i

2008 11 06 (2012-07-01 10:11 ) 15 f (factor) > d$f [1] C C C C C C C C C C C C C C C C C C C C C C C C C [26] C C C C C C C C C C C C C C C C C C C C C C C C C [51] T T T T T T T T T T T T T T T T T T T T T T T T T [76] T T T T T T T T T T T T T T T T T T T T T T T T T Levels: C T R glm() 38 38. numeric, integer, factor > fit.f <- glm(y ~ f, data = d, family = poisson) > fit.f Call: glm(formula = y ~ f, family = poisson, data = d) Coefficients: (Intercept) ft 2.05156 0.01277 Degrees of Freedom: 99 Total (i.e. Null); 98 Residual Null Deviance: 89.51 Residual Deviance: 89.48 AIC: 479.3 glm(y ~ f,...) Coefficients (Intercept) ( ) f ft f C ( ) T ( ) 2 ( ) 1 C 2 39 i f C T λ i exp(2.05 + 0) = exp(2.05) λ i exp(2.05 + 0.0128) = exp(2.0628) 39. 2

2008 11 06 (2012-07-01 10:11 ) 16 > loglik(fit.f) # fit.f <- glm(y ~ f,...) log Lik. -237.6273 (df=2) 6. + f glm() x > fit.full <- glm(y ~ x + f, data = d, family = poisson) > fit.full Call: glm(formula = y ~ x + f, family = poisson, data = d) Coefficients: (Intercept) x ft 1.26311 0.08007-0.03200 Degrees of Freedom: 99 Total (i.e. Null); 97 Residual Null Deviance: 89.51 Residual Deviance: 84.81 AIC: 476.6 ft log link i x i C λ i exp(1.26 + 0.08x i ) T λ i exp(1.26 + 0.08x i 0.032) exp(1.26 + 0.08x i 0.032) = exp(1.26 + 0.08x i ) exp( 0.032) = () ( ) glm(y ~ x + f,...)

2008 11 06 (2012-07-01 10:11 ) 17 exp( 0.032) 0.969 0.969 > loglik(fit.full) # fit.full <- glm(y ~ x + f,...) log Lik. -235.2937 (df=3) 7. Deviance?? 40 40. loglik() deviance 41 log L({β j } ) log L { ˆβ i } log L log L summary(glm(y...)) log L 42 deviance deviance D = 2 log L 41. deviance 42. loglik() log L -2 43 43. -2? x i λ i = exp(β 1 + β 2 x i )

2008 11 06 (2012-07-01 10:11 ) 18 log L -235.4 deviance (D = 2 log L ) 470.8 glm()... ( )... Null Deviance: 89.51 Residual Deviance: 84.99 AIC: 474.8 470.8 Null Deviance Residual Deviance AIC Residual Deviance 44 Residual deviance deviance D 44. Residual D ( D) 45 ( D) 45. Deviance residual deviance? full model 100 100 (!) deviance 46 data.frame d > d$y [1] 6 6 6 12 10 4 9 9 9 11 6 10 6 10 11 8... ( )... x i λ i d$y full model 46. β 1 β 2 2 1, 2, 3 y 6 6 4 y 12 12 5 y 10 10... ()... 100 100?

2008 11 06 (2012-07-01 10:11 ) 19 47 47. > sum(dpois(d$y, lambda = d$y, log = TRUE)) [1] -192.8898 log L -235.4 48 48. deviance D = 2 log L 385.8 ( D) residual deviance D ( D) 470.8 385.8 = 85.0 glm() Residual Deviance: 84.99 AIC: 474.8 residual deviance 49 49. residual deviance residual deviance? Residual deviance full model () λ i = exp(β 1 ) 1 R null model 50 null model > fit.null <- glm(formula = y ~ 1, family = poisson, data = d) > fit.null Call: glm(formula = y ~ 1, family = poisson, data = d) Coefficients: (Intercept) 2.058 Degrees of Freedom: 99 Total (i.e. Null); 99 Residual Null Deviance: 89.51 Residual Deviance: 89.51 AIC: 477.3 residual deviance 89.5 null model 50. Null (null hypothesis) null model

2008 11 06 (2012-07-01 10:11 ) 20 > loglik(fit.null) # fit.null <- glm(y ~ 1,...) log Lik. -237.6432 (df=1) null model residual deviance glm()... ( )... Degrees of Freedom: 99 Total (i.e. Null); 98 Residual Null Deviance: 89.51 Residual Deviance: 84.99 AIC: 474.8 Null Deviance residual deviance 85.0 residual deviance 89.5 x i residual deviance 4.1 51 ( ) degree of freedom () 51. Degrees of Freedom: 99 Total (i.e. Null); 98 Residual Null model () 99 ( 100-1) 98 (100-2) (k) (log L ) deviance (D = 2 log L ) residual deviance ( 2 log L D) Model k log L Deviance Residual 2 log L deviance NULL 1-237.6 475.3 89.5 f 2-237.6 475.3 89.5 x 2-235.4 470.8 85.0 x + f 3-235.3 470.6 84.8 FULL 100-192.9 385.8 0.0

2008 11 06 (2012-07-01 10:11 ) 21 table residual deviance 8. (= deviance ) full model 52 52. R 2 (model selection) AIC (Akaike s information criterion) AIC = 2( ) + 2() = 2 log L + 2k AIC Model k log L Deviance Residual 2 log L AIC deviance NULL 1-237.6 475.3 89.5 477.3 f 2-237.6 475.3 89.5 479.3 x 2-235.4 470.8 85.0 474.8 x + f 3-235.3 470.6 84.8 476.6 FULL 100-192.9 385.8 0.0 585.8 (Model x) 53 53. R stepaic() AIC glm()

2008 11 06 (2012-07-01 10:11 ) 22 > library(mass) # stepaic MASS package > stepaic(fit.full) # Start: AIC=476.59 y ~ x + f Df Deviance AIC - f 1 84.99 474.77 <none> 84.81 476.59 - x 1 89.48 479.25 Step: y ~ x AIC=474.77 Df Deviance AIC <none> 84.99 474.77 - x 1 89.51 477.29 Call: glm(formula = y ~ x, family = poisson, data = d) Coefficients: (Intercept) x 1.29172 0.07566 Degrees of Freedom: 99 Total (i.e. Null); 98 Residual Null Deviance: 89.51 Residual Deviance: 84.99 AIC: 474.8 & 9. (GLM) ( ) ANOVA R glm() GLM : link = (poisson) link = log glm() GLM

2008 11 06 (2012-07-01 10:11 ) 23 (prediction) Deviance -2 AIC = Deviance + 2 ( ) logit link GLM