講義のーと : データ解析のための統計モデリング. 第2回

Similar documents
k2 ( :35 ) ( k2) (GLM) web web 1 :

講義のーと : データ解析のための統計モデリング. 第3回

講義のーと : データ解析のための統計モデリング. 第5回

kubostat2017b p.1 agenda I 2017 (b) probability distribution and maximum likelihood estimation :

kubostat2017c p (c) Poisson regression, a generalized linear model (GLM) : :

一般化線形 (混合) モデル (2) - ロジスティック回帰と GLMM

kubo2017sep16a p.1 ( 1 ) : : :55 kubo ( ( 1 ) / 10

-99-

12/1 ( ) GLM, R MCMC, WinBUGS 12/2 ( ) WinBUGS WinBUGS 12/2 ( ) : 12/3 ( ) :? ( :51 ) 2/ 71

kubostat2017e p.1 I 2017 (e) GLM logistic regression : : :02 1 N y count data or



kubostat2015e p.2 how to specify Poisson regression model, a GLM GLM how to specify model, a GLM GLM logistic probability distribution Poisson distrib



















~::: :,/ ~7;



-17-












k3 ( :07 ) 2 (A) k = 1 (B) k = 7 y x x 1 (k2)?? x y (A) GLM (k









~:::









(2/24) : 1. R R R




~ 料 ii


























PackageSoft/R-033U.tex (2018/March) R:






Transcription:

Title 講義のーと : データ解析のための統計モデリング Author(s) 久保, 拓弥 Issue Date 2008 Doc URL http://hdl.handle.net/2115/49477 Type learningobject Note この講義資料は, 著者のホームページ http://hosho.ees.hokudai.ac.jp/~kub ードできます Note(URL)http://hosho.ees.hokudai.ac.jp/~kubo/ce/EesLecture20 Additional There Information are other files related to this item in HUSCAP File Information kubostat2008b.pdf ( 第 2 回 ) Instructions for use Hokkaido University Collection of Scholarly and Aca

2008 10 30 (2012-07-01 10:11 ) 1 (2008 10-11 ) 5 (+2) 2 (2008 10 30) kubo@ees.hokudai.ac.jp http://hosho.ees.hokudai.ac.jp/~kubo/ce/eeslecture2008.html! http://hosho.ees.hokudai.ac.jp/~kubo/ce/iwanamibook.html 1. R &?! 2 2.? 6 3. : 9 4. λ 11 5. 14 6. 15 7. 17 2 R? R ( )?

2008 10 30 (2012-07-01 10:11 ) 2 : ( ) 2008 10 30 4/ 7 1:?? 1 1. 2 2. 7 1. R & 5?! R 3 R 4 data 5 vector object (e-mail) 50 R 6 > data [1] 2 2 4 6 4 5 2 3 1 2 0 4 3 3 3 3 4 2 7 2 4 3 3 3 4 [26] 3 7 5 3 1 7 6 4 6 5 2 4 7 2 2 6 2 4 5 4 5 1 3 2 3 data vector object 0 1, 2 ( ) vector 3. 4. 5. 6. [1] [26] 1 26

2008 10 30 (2012-07-01 10:11 ) 3 vector object data R data R 7 8 data 7. R > R 50 data R 9 > length(data) # data? [1] 50 > summary(data) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00 2.00 3.00 3.56 4.75 7.00 length() data 50 summary() summary(data) Min. Max. data 1st Qu., Median, 3rd Qu. data 25%, 50%, 75% 8. print(data) 9. R # # 10 Median ( ) 10. (Mean) 3.56 ( ) (variance) > var(data) [1] 2.9861 data ( ) (standard deviation; SD) R > sqrt(var(data)) [1] 1.7280 > sd(data) [1] 1.7280

2008 10 30 (2012-07-01 10:11 ) 4 5 50? R 11 > summary(as.factor(data)) 0 1 2 3 4 5 6 7 1 3 11 12 10 5 4 4 > table(data) 0 1 2 3 4 5 6 7 1 3 11 12 10 5 4 4 data (histogram) 11. summary() :-) 12 12. seq( ) R seq(-0.5, 8.5, 1) > hist(data, breaks = seq(-0.5, 8.5, 1)) Histogram of data Frequency 0 4 8 12 0 2 4 6 8 data R ( 50 ) 3.56 (probablistic)

2008 10 30 (2012-07-01 10:11 ) 5? ( ) (Poisson distribution) 13 13.? 3.56 14 > y <- 0:9 > prob <- dpois(y, lambda = 3.56) > plot(y, prob, type = "b", lty = 2) 14. data ( ) prob 0.00 0.10 0.20 0 2 4 6 8 y > data.frame(y, prob) # y prob y prob 1 0 0.02843882 2 1 0.10124222 3 2 0.18021114 4 3 0.21385056 5 4 0.19032700 6 5 0.13551282 7 6 0.08040427 8 7 0.04089132 9 8 0.01819664 10 9 0.00719778 y ( y y {0, 1, 2, 3, }) y ( prob) R y prob dpois(y, lambda =

2008 10 30 (2012-07-01 10:11 ) 6 )? 3.56 0.03 3 0.21 3.56? 15 15. 3.56 ( 3.56 ) (data 50 )? 16 16. hist() > # data, y, prob > hist(data, breaks = seq(-0.5, 8.5, 1)) lines() 50 > lines(y, prob * 50, type = "b", lty = 2) # 50 Histogram of data R Frequency 0 4 8 12 0 2 4 6 8 data? 17 17. ( 3.56 ) 3.56?? ( )?

2008 10 30 (2012-07-01 10:11 ) 7 2.? 18 18.? (probablistic density function) p(y λ) = λy exp( λ) y! p(y λ) 19 ( ) λ y ( : 2 y = 2 ) 19. p(y) y λ 20 20. exp( y) = e y e (e = 2.7182 ) y! y 4! 1 2 3 4 y = 0, 1, 2, 3,, ( ) p(y λ) = 1 λ (λ 0) y=1 λ : λ = = 21 λ 22 23 > y <- 0:20 > plot(y, dpois(y, lambda = 3.5), type = "b", lty = 2, pch = 21, + ylab = "prob") > lines(y, dpois(y, lambda = 7.7), type = "b", lty = 2, pch = 23) > lines(y, dpois(y, lambda = 15.1), type = "b", lty = 2, pch = 24) > legend("topright", legend = c(3.5, 7.7, 15.1), pch = c(21, 23, 24), + title = "lambda", cex = 0.7) ( )? 21.! 22. plot() lines() 23. R > +

2008 10 30 (2012-07-01 10:11 ) 8 prob 0.00 0.10 0.20 lambda 3.5 7.7 15.1 0 5 10 15 20 y data (y ) 0, 1, 2, y ( ) 24 > mean(data) # data [1] 3.56 > var(data) # data [1] 2.986122 24. (event)??

2008 10 30 (2012-07-01 10:11 ) 9 1. 2. 2. (negative binomial distribution) 25 1. 2.? R 25. 6 6 26 26. 3. : data > data [1] 2 2 4 6 4 5 2 3 1 2 0 4 3 3 3 3 4 2 7 2 4 3 3 3 4 [26] 3 7 5 3 1 7 6 4 6 5 2 4 7 2 2 6 2 4 5 4 5 1 3 2 3 2 ( ) 27 (random number) (sample) sampling 27.

2008 10 30 (2012-07-01 10:11 ) 10 = prob 0.00 0.10 0.20 0 2 4 6 8 y (, ) 2 2 4 6 4 5 2 3 1 2 0 4 3 3 3 3 4 2 7 2 4 3 3 3 4 3 7 5 3 1 7 6 4 6 5 2 4 7 2 2 6 2 4 5 4 5 1 3 2 3 2: ( )? (estimation) ( 3) 2 2 4 6 4 5 2 3 1 2 0 4 3 3 3 3 4 2 7 2 4 3 3 3 4 3 7 5 3 1 7 6 4 6 5 2 4 7 2 2 6 2 4 5 4 5 1 3 2 3 prob 0.00 0.10 0.20? 0 2 4 6 8 y 3: 1. ( ) X ( X ) 2. X X (parameter) 28 28. 29 data 50 29.

2008 10 30 (2012-07-01 10:11 ) 11 R R rpois() data > data <- rpois(50, lambda = 3.5) > data [1] 2 2 4 6 4 5 2 3 1 2 0 4 3 3 3 3 4 2 7 2 4 3 3 3 4 [26] 3 7 5 3 1 7 6 4 6 5 2 4 7 2 2 6 2 4 5 4 5 1 3 2 3 3.56 3.5 > y <- rpois(50, lambda = 3.5); print(y); print(mean(y)) [1] 5 6 2 5 2 4 3 4 4 6 4 1 3 5 5 1 3 1 5 2 0 2 0 4 3 [26] 2 6 0 2 3 6 3 2 3 1 2 6 4 2 2 3 4 0 6 5 5 4 1 5 5 [1] 3.24 3.24 > y <- rpois(50, lambda = 3.5); print(y); print(mean(y)) [1] 1 4 2 2 4 1 3 1 2 5 4 3 5 7 1 1 2 2 3 9 0 3 5 3 5 [26] 3 3 3 4 4 4 4 3 0 4 9 3 1 7 3 2 1 2 4 3 0 6 2 2 2 [1] 3.14 > y <- rpois(50, lambda = 3.5); print(y); print(mean(y)) [1] 5 7 2 2 4 3 5 4 4 3 3 7 5 3 5 2 5 [18] 6 2 4 4 2 3 9 1 6 0 3 3 11 1 2 2 3 [35] 3 3 2 2 3 4 3 4 6 5 4 4 4 4 4 2 [1] 3.76 2 3 3.14 3.76 4. λ data

2008 10 30 (2012-07-01 10:11 ) 12 1. data 2. data 3.56 ( ) λ 30 1. 2. 3.56?? (maximum likelihood estimation) 31 data i y i {y 1 = 2, y 2 = 2, y 3 = 4,, y 49 = 2, y 50 = 3} λ {y i } = {y 1, y 2, y 3,, y 50 } (likelihood) p(y λ) = λy exp( λ) y! 30. λ 31. ( ) 32 33 32. λ L(λ {y i }) = (y 1 2 ) (y 2 2 ) (y 50 3 ) L(λ {y i }) = p(y 1 λ) p(y 2 λ) p(y 3 λ) p(y 50 λ) λ 50 = p(y i λ) i=1 = 50 i=1 λ y i exp( λ) y i! λ λ 33. L(λ {y i }) λ (log likelihood) 34 { } 50 y i log L(λ {y i }) = y i log(λ) λ log(k) i=1 λ log L(λ {y i }) λ k=1 ˆλ 34. log(l) L L log(l)

2008 10 30 (2012-07-01 10:11 ) 13 data λ > lambda <- seq(2, 5, 0.1) > likelihood <- function(lambda) sum(dpois(data, lambda, log = TRUE)) > plot( + lambda, + sapply(lambda, likelihood), + type = "l", + xlab = "lambda", + ylab = "log likelihood" + ) log likelihood 120 105 2.0 3.0 4.0 5.0 lambda ( ) λ log L(λ {y i }) λ 50 { yi } = λ 1 = i=1 50 i=1 y i 50 λ log L(λ {y i }) λ ˆλ 50 i=1 y i 50 = 0 ˆλ ˆλ = 50 i=1 y i 50 = y i = 35 {y i } ˆλ 35. (maximum likelihood estimator) {y 1 = 2, y 2 = 2, y 3 = 4,, y 49 = 2, y 50 = 3} y i ˆλ = 3.56 (maximum likelihood estimate) data 3.56

2008 10 30 (2012-07-01 10:11 ) 14 36 36. (quasi likelihood) overdispersion 5. (pesudo likelihood) 0, 1, 2, 0 N y y {0, 1, 2,, N} (binomial distribution) ( ) N p(y N, q) = q y (1 q) N y y Bayesian y {0, 1, 2,, N} > y <- 0:10 > plot(y, dbinom(y, 10, prob = 0.1), type = "b", lty = 2, pch = 21, + ylab = "prob") > lines(y, dbinom(y, 10, prob = 0.3), type = "b", lty = 2, pch = 23) > lines(y, dbinom(y, 10, prob = 0.6), type = "b", lty = 2, pch = 24) > legend("topright", legend = c(0.1, 0.3, 0.6), pch = c(21, 23, 24), + title = "q", cex = 0.7) prob 0.0 0.2 0.4 0 2 4 6 8 10 y q 0.1 0.3 0.6 N y 4 : 11/10 ( ) (GLM) 2

2008 10 30 (2012-07-01 10:11 ) 15 6. (normal distribution; :, Gaussian distribution) 37 {1.5, 3.2, 7.7, } 0 1 2 µ ± 38 39 σ > y <- seq(-5, 5, 0.1) > plot(y, dnorm(y, mean = 0, sd = 1), type = "l", + ylab = "prob.density") # > lines(y, dnorm(y, mean = 0, sd = 3), lty = 2) # > lines(y, dnorm(y, mean = 3, sd = 1), lty = 3) # 37.? 38. λ 39. 2 = prob.density 0.0 0.2 0.4 4 2 0 2 4 y µ σ p(y µ, σ) = 1 µ)2 exp( (y ) 2πσ 2 2σ 2 170 cm 5 cm 40 177 cm 178 cm 40.

2008 10 30 (2012-07-01 10:11 ) 16 > pnorm(178, mean = 170, sd = 5) - pnorm(177, mean = 170, sd = 5) [1] 0.02595737 > dnorm(177.5, mean = 170, sd = 5) * 1.0 # [1] 0.02590352 178 cm > pnorm(178, mean = 170, sd = 5) - pnorm(178, mean = 170, sd = 5) [1] 0 > dnorm(178, mean = 170, sd = 5) [1] 0.02218417 {y i } (i = 1, 2,, N) L(y i µ, σ) = = N p(y µ, σ) y i=1 N 1 exp( (y i µ) 2 ) y 2πσ 2 2σ 2 i=1 y ( : 0.1 cm ) 41 p(y µ, σ) ( ) 41. 42 42. drnorm() log L(y i µ, σ) = N 0.5N log(2πσ 2 i=1 ) i µ) 2 + N y 2σ 2 σ µ 43 log L(y i µ, σ) 43. N y N i=1 (y i µ) 2 N 2σ 2 i=1 (y i µ) 2 (

2008 10 30 (2012-07-01 10:11 ) 17 ) ˆµ 44 44. σ 7.? 1.? : {, }, : {0.56, 1.33, 12.4, 9.84, } 2.? {0, 1,, N}, {0, 1,, }, [y min, y max ], [, ], 3. ( )?,,, n, : : < : ({0, 1, 2,, N}) : [, + ] : [0, + ] : [0, + ] : [0, 1]

2008 10 30 (2012-07-01 10:11 ) 18 R 45 45. ( ) rbinom() glm(family = binomial) rbinom() glm(family = binomial) rpois() glm(family = poisson) rnbinom() glm.nb() ( ) rgamma() glm(family = gamma) rnorm() glm(family = gaussian) glm() 3 : (GLM) 1 N p {0, 1, 2,, N} N = 1 {0, 1} :,, 2008 10 30 5/ 7 4: