12/1 ( ) GLM, R MCMC, WinBUGS 12/2 ( ) WinBUGS WinBUGS 12/2 ( ) : 12/3 ( ) :? ( :51 ) 2/ 71

Similar documents
kubo2015ngt6 p.2 ( ( (MLE 8 y i L(q q log L(q q 0 ˆq log L(q / q = 0 q ˆq = = = * ˆq = 0.46 ( 8 y 0.46 y y y i kubo (ht

kubostat1g p. MCMC binomial distribution q MCMC : i N i y i p(y i q = ( Ni y i q y i (1 q N i y i, q {y i } q likelihood q L(q {y i } = i=1 p(y i q 1

/ *1 *1 c Mike Gonzalez, October 14, Wikimedia Commons.

kubostat2017j p.2 CSV CSV (!) d2.csv d2.csv,, 286,0,A 85,0,B 378,1,A 148,1,B ( :27 ) 10/ 51 kubostat2017j (

/22 R MCMC R R MCMC? 3. Gibbs sampler : kubo/

kubo2017sep16a p.1 ( 1 ) : : :55 kubo ( ( 1 ) / 10

一般化線形 (混合) モデル (2) - ロジスティック回帰と GLMM

講義のーと : データ解析のための統計モデリング. 第2回

kubostat2017c p (c) Poisson regression, a generalized linear model (GLM) : :

講義のーと : データ解析のための統計モデリング. 第3回

(2/24) : 1. R R R

: (GLMM) (pseudo replication) ( ) ( ) & Markov Chain Monte Carlo (MCMC)? /30

講義のーと : データ解析のための統計モデリング. 第5回

kubostat2017e p.1 I 2017 (e) GLM logistic regression : : :02 1 N y count data or

: Bradley-Terry Burczyk

k3 ( :07 ) 2 (A) k = 1 (B) k = 7 y x x 1 (k2)?? x y (A) GLM (k

,, Poisson 3 3. t t y,, y n Nµ, σ 2 y i µ + ɛ i ɛ i N0, σ 2 E[y i ] µ * i y i x i y i α + βx i + ɛ i ɛ i N0, σ 2, α, β *3 y i E[y i ] α + βx i


kubostat2017b p.1 agenda I 2017 (b) probability distribution and maximum likelihood estimation :

Dirichlet process mixture Dirichlet process mixture 2 /40 MIRU2008 :

分布

kubostat2018d p.2 :? bod size x and fertilization f change seed number? : a statistical model for this example? i response variable seed number : { i

X X X Y R Y R Y R MCAR MAR MNAR Figure 1: MCAR, MAR, MNAR Y R X 1.2 Missing At Random (MAR) MAR MCAR MCAR Y X X Y MCAR 2 1 R X Y Table 1 3 IQ MCAR Y I

03.Œk’ì

日本内科学会雑誌第97巻第7号

日本内科学会雑誌第98巻第4号

/ 55 2 : : (GLM) 1. 1/23 ( )? GLM? (GLM ) 2.! 1/25 ( ) ffset (GLM )

Stanによるハミルトニアンモンテカルロ法を用いたサンプリングについて

tokei01.dvi

医系の統計入門第 2 版 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 第 2 版 1 刷発行時のものです.

確率論と統計学の資料

2301/1     目次・広告



N cos s s cos ψ e e e e 3 3 e e 3 e 3 e

バイオインフォマティクス特論4

& 3 3 ' ' (., (Pixel), (Light Intensity) (Random Variable). (Joint Probability). V., V = {,,, V }. i x i x = (x, x,, x V ) T. x i i (State Variable),

10:30 12:00 P.G. vs vs vs 2

Ł\”ƒ-2005

第90回日本感染症学会学術講演会抄録(I)

Ł\”ƒ.dvi

日本内科学会雑誌第102巻第4号

Kullback-Leibler

kubostat2018a p.1 統計モデリング入門 2018 (a) The main language of this class is 生物多様性学特論 Japanese Sorry An overview: Statistical Modeling 観測されたパターンを説明する統計モデル

統計モデリング入門 2018 (a) 生物多様性学特論 An overview: Statistical Modeling 観測されたパターンを説明する統計モデル 久保拓弥 (北海道大 環境科学) 統計モデリング入門 2018a 1

1 15 R Part : website:

ii 3.,. 4. F. (), ,,. 8.,. 1. (75%) (25%) =7 20, =7 21 (. ). 1.,, (). 3.,. 1. ().,.,.,.,.,. () (12 )., (), 0. 2., 1., 0,.

Multivariate Realized Stochastic Volatility Models with Dynamic Correlation and Skew Distribution: Bayesian Analysis and Application to Risk Managemen


DSGE Dynamic Stochastic General Equilibrium Model DSGE 5 2 DSGE DSGE ω 0 < ω < 1 1 DSGE Blanchard and Kahn VAR 3 MCMC

Rによる計量分析:データ解析と可視化 - 第3回 Rの基礎とデータ操作・管理

1 (Berry,1975) 2-6 p (S πr 2 )p πr 2 p 2πRγ p p = 2γ R (2.5).1-1 : : : : ( ).2 α, β α, β () X S = X X α X β (.1) 1 2

36

seminar0220a.dvi

O1-1 O1-2 O1-3 O1-4 O1-5 O1-6


: (EQS) /EQUATIONS V1 = 30*V F1 + E1; V2 = 25*V *F1 + E2; V3 = 16*V *F1 + E3; V4 = 10*V F2 + E4; V5 = 19*V99

JMP V4 による生存時間分析

C el = 3 2 Nk B (2.14) c el = 3k B C el = 3 2 Nk B

プログラム

(5) 75 (a) (b) ( 1 ) v ( 1 ) E E 1 v (a) ( 1 ) x E E (b) (a) (b)

放射線専門医認定試験(2009・20回)/HOHS‐05(基礎二次)

Transcription:

2010-12-02 (2010 12 02 10 :51 ) 1/ 71 GCOE 2010-12-02 WinBUGS kubo@ees.hokudai.ac.jp http://goo.gl/bukrb

12/1 ( ) GLM, R MCMC, WinBUGS 12/2 ( ) WinBUGS WinBUGS 12/2 ( ) : 12/3 ( ) :? 2010-12-02 (2010 12 02 10 :51 ) 2/ 71

2010-12-02 (2010 12 02 10 :51 ) 3/ 71 ( ) random effects BUGS

2010-12-02 (2010 12 02 10 :51 ) 4/ 71

2010-12-02 (2010 12 02 10 :51 ) 5/ 71 GLM (1)? y = 0, 1, 2, 3, (y ) (family = poisson) y = {0, 1}, y = {0, 1, 2,, N} (family = binomial) (family = Gamma) (family = gaussian)

R : glm() ( ) rbinom() glm(family = binomial) rbinom() glm(family = binomial) rpois() glm(family = poisson) rnbinom() glm.nb() ( ) rgamma() glm(family = gamma) rnorm() glm(family = gaussian) glm() glm.nb() MASS library GLM 2010-12-02 (2010 12 02 10 :51 ) 6/ 71

2010-12-02 (2010 12 02 10 :51 ) 7/ 71 GLM (2)!! GLMM! GLMM/ random effects GLM

2010-12-02 (2010 12 02 10 :51 ) 8/ 71 (Poisson distribution) (Binomial distribution) ( ) (Normal distribution, Gaussian ) (Gamma distribution)

2010-12-02 (2010 12 02 10 :51 ) 9/ 71

2010-12-02 (2010 12 02 10 :51 ) 10/ 71

2010-12-02 (2010 12 02 10 :51 ) 11/ 71 : + WinBUGS 1. : GLMM 2.

1. : GLMM

: 8 ( ) ( ) : N i = 8 i y i = 3 : 100 800 :? 2010-12-02 (2010 12 02 10 :51 ) 13/ 71

2010-12-02 (2010 12 02 10 :51 ) 14/ 71 :?! 100 800 403 0.50 0 5 10 15 20 25! 0 2 4 6 8 y i

2010-12-02 (2010 12 02 10 :51 ) 15/ 71 (overdispersion) 0 5 10 15 20 25 0 2 4 6 8 y i 0.5 : overdispersion :?

? : 1. fixed effects 2. random effects 2010-12-02 (2010 12 02 10 :51 ) 16/ 71

2010-12-02 (2010 12 02 10 :51 ) 17/ 71 fixed random? fixed/random effects : 1. fixed effects : ( ) fixed effects 2. random effects : fixed effects ( ) random effects

2010-12-02 (2010 12 02 10 :51 ) 18/ 71 : i N i y i ( ) Ni p(y i q i ) = q y i i (1 q i) N i y i, y i q i

2010-12-02 (2010 12 02 10 :51 ) 19/ 71 q i = q(z i ) (logistic) q(z) = 1/{1 + exp( z)} 0.5 1.0 q(z) 4 2 0 2 4 z z i = a + b i a: b i : i ( )

b i 100 101 (a {b 1, b 2,, b 100 }) /! ( )?? ( ) 2010-12-02 (2010 12 02 10 :51 ) 20/ 71

2010-12-02 (2010 12 02 10 :51 ) 21/ 71 : b i s p(b i s) = 1 2πs 2 exp b2 i 2s 2, = 1 = 1.5 = 3 b i 6 4 2 0 2 4 6 {b 1, b 2,, b 100 }

2010-12-02 (2010 12 02 10 :51 ) 22/ 71 b i s = 1 = 1.5 = 3 b i 6 4 2 0 2 4 6 s b i s b i y i

(C) : b i s s 2010-12-02 (2010 12 02 10 :51 ) 23/ 71 b i? (A) (B) (C)!? s -4-2 0 2 4-4 -2 0 2 4-4 -2 0 2 4 (A) : b i (B) : b i ( -5 5 )

y i = 2 b i b i p(b i y i = 2, s) s p(y i = 2 b i ) b i p(b i s) 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 s b i 2010-12-02 (2010 12 02 10 :51 ) 24/ 71

y i {2, 3, 5} b i s s 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 s b i 2010-12-02 (2010 12 02 10 :51 ) 25/ 71

2010-12-02 (2010 12 02 10 :51 ) 26/ 71? データ N[i] Y[i] 胚珠数中の生存 tau は hyper parameter 二項分布生存確率 q[i] 全体の平均 a 無情報事前分布 植物の個体差 b[i] 事前分布 tau 個体差のばらつき 無情報事前分布 ( 超事前分布 )

2010-12-02 (2010 12 02 10 :51 ) 27/ 71 データ N[i] Y[i] 胚珠数中の生存 tau は hyper parameter 二項分布生存確率 q[i] 全体の平均 a 無情報事前分布 植物の個体差 b[i] 事前分布 tau 個体差のばらつき 無情報事前分布 ( 超事前分布 )

? b i s = 0.1? s = 0.1 s individual 1 2 3 posterior prior small large ( ) s 2010-12-02 (2010 12 02 10 :51 ) 28/ 71

2010-12-02 (2010 12 02 10 :51 ) 29/ 71 τ = 1/s 2 s τ (non-informative prior) p(τ ) = τ α 1 e τ β Γ(α)β α, α = β = 10 4

2010-12-02 (2010 12 02 10 :51 ) 30/ 71 (1) τ 0.0 0.2 0.4 0.6 0.8 1.0 ( 1; 1) ( 1; 100) 0 2 4 6 8 10 τ τ

(2) a 0.0 0.1 0.2 0.3 0.4 ( 0; 1) ( 0; 100) -10-5 0 5 10 a (logit) a 2010-12-02 (2010 12 02 10 :51 ) 31/ 71

2010-12-02 (2010 12 02 10 :51 ) 32/ 71 p(a, {b i }, τ ) = 100 i=1 p(y i q(a + b i )) p(a) p(b i τ ) h(τ ) ( ) db i dτ da p(a, {b i }, τ ) 100 i=1 p(y i q(a + b i )) p(a) p(b i τ ) h(τ ) : : p(a, {b i }, τ ) 100 i=1 p(y i q(a + b i )) : p(a) p(b i τ ) h(τ )

2010-12-02 (2010 12 02 10 :51 ) 33/ 71 b i s individual 1 2 3 small hyperprior posterior prior (posterior) large hyperparameter s MCMC

? p(a, {b i }, τ ) 100 i=1 p(y i q(a + b i )) p(a) p(b i τ ) h(τ ) p(a, {b i }, τ ) Markov chain Monte Carlo (MCMC)! WinBUGS 2010-12-02 (2010 12 02 10 :51 ) 34/ 71

Gibbs sampling p(a ) 100 p(y i q(a + b i )) p(a) i=1 p(τ ) 100 i=1 p(b i τ ) h(τ ) p(b 1 ) p(y 1 q(a + b 1 )) p(b 1 τ ) p(b 2 ) p(y 2 q(a + b 2 )) p(b 2 τ ). p(b 100 ) p(y 100 q(a + b 100 )) p(b 100 τ ) 2010-12-02 (2010 12 02 10 :51 ) 35/ 71

2010-12-02 (2010 12 02 10 :51 ) 36/ 71 0 5 10 15 20 25 0 2 4 6 8

2010-12-02 (2010 12 02 10 :51 ) 37/ 71 : 0 5 10 15 20 25 0 2 4 6 8

2010-12-02 (2010 12 02 10 :51 ) 38/ 71 ( ) ( ) ( ) ( ) データ N[i] Y[i] 胚珠数中の生存 tau は hyper parameter 二項分布生存確率 q[i] 全体の平均 a 無情報事前分布 植物の個体差 b[i] 事前分布 tau 個体差のばらつき 無情報事前分布 ( 超事前分布 ) : Markov Chain Monte Carlo (MCMC)

2010-12-02 (2010 12 02 10 :51 ) 39/ 71

2010-12-02 (2010 12 02 10 :51 ) 40/ 71 WinBUGS

2010-12-02 (2010 12 02 10 :51 ) 41/ 71 WinBUGS 1. 2. BUGS (model.bug.txt) 3. R2WBwrapper R (runbugs.r) 4. R runbugs.r (source(runbugs.r) ) 5. bugs

2010-12-02 (2010 12 02 10 :51 ) 42/ 71? データ N[i] Y[i] 胚珠数中の生存 tau は hyper parameter 二項分布生存確率 q[i] 全体の平均 無情報事前分布 p(a, {b i }, τ ) 100 i=1 a 植物の個体差 b[i] 事前分布 tau 個体差のばらつき 無情報事前分布 ( 超事前分布 ) p( q(a + b i )) p(a) p(b i τ ) h(τ )

BUGS model.bug.txt ( ) model{ for (i in 1:N.sample) { Y[i] ~ dbin(q[i], N[i]) # logit(q[i]) <- a + b[i] # q[i] } a ~ dnorm(0, 1.0E-4) # for (i in 1:N.sample) { b[i] ~ dnorm(0, tau) # } tau ~ dgamma(1.0e-4, 1.0E-4) # sigma <- sqrt(1 / tau) # tau SD } 2010-12-02 (2010 12 02 10 :51 ) 43/ 71

2010-12-02 (2010 12 02 10 :51 ) 44/ 71 (hierarchical) random effects (non-informative) fixed effects (subjective) ( )

BUGS BUGS ( ) node ( ) 1. ~ sthochastic node 2. <- deterministic node 2010-12-02 (2010 12 02 10 :51 ) 45/ 71

R2WBwrapper R runbugs.r ( ) source("r2wbwrapper.r") # R2WBwrapper d <- read.csv("data.csv") # clear.data.param() # ( ) set.data("n.sample", nrow(d)) # set.data("n", d$n) # set.data("y", d$y) # 2010-12-02 (2010 12 02 10 :51 ) 46/ 71

R2WBwrapper R runbugs.r ( ) set.param("a", 0) # set.param("sigma", NA) # set.param("b", rep(0, N.sample)) # set.param("tau", 1, save = FALSE) # set.param("p", NA) # post.bugs <- call.bugs( # WinBUGS file = "model.bug.txt", n.iter = 2000, n.burnin = 1000, n.thin = 5 ) 2010-12-02 (2010 12 02 10 :51 ) 47/ 71

WinBUGS post.bugs <- call.bugs( file = "model.bug.txt", # WinBUGS n.iter = 2000, n.burnin = 1000, n.thin = 5 ) default ( ) 3 (n.chains = 3) MCMC sampling ( ) cf. PC MCMC chain 2000 step (n.iter = 2000) 1000 step (n.burnin = 1000) 1001 2000 step 5 step (n.thin = 5) 2010-12-02 (2010 12 02 10 :51 ) 48/ 71

? R source("runbugs.r") WinBUGS MCMC sampling (WinBUGS ) WinBUGS WinBUGS R post.bugs 2010-12-02 (2010 12 02 10 :51 ) 49/ 71

2010-12-02 (2010 12 02 10 :51 ) 50/ 71 R a a?

2010-12-02 (2010 12 02 10 :51 ) 51/ 71 bugs post.bugs (1) plot(post.bugs), R-hat Gelman-Rubin var ˆ ˆR + (ψ y) = W var ˆ + (ψ y) = n 1 n W + 1 n B W : chain variance B : chain variance Gelman et al. 2004. Bayesian Data Analysis. Chapman & Hall/CRC

2010-12-02 (2010 12 02 10 :51 ) 52/ 71 ubothinkpad/public_html/stat/2009/ism/winbugs/model.bug.txt", fit using WinBUGS, 3 chains, each with 1300 i 80% interval for each chain R-hat -10-5 0 5 10 1 1.5 2+ a sigma * b[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] tau * q[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] -10-5 0 5 10 1 1.5 2+ * array truncated for lack of space a sigma * b tau * q 0.5 0-0.5 3.5 3 2.5 10 5 0-5 -10 0.2 0.15 0.1 0.05 1 0.5 0 260 240 deviance 220 200 medians and 80% intervals 1 2 3 4 5 6 7 8 910 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 1 2 3 4 5 6 7 8 910 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff a 0.018 0.322-0.621-0.202 0.025 0.233 0.628 1.030 75 sigma 2.980 0.361 2.346 2.738 2.948 3.205 3.752 1.003 590 b[1] -3.800 1.711-7.652-4.776-3.503-2.554-1.193 1.002 1100 b[2] -1.142 0.874-3.003-1.688-1.111-0.530 0.464 1.010 200 b[3] 1.992 1.047 0.169 1.251 1.889 2.665 4.346 1.005 390 b[4] 3.745 1.781 0.975 2.503 3.408 4.751 7.926 1.008 520 b[5] -2.005 1.066-4.257-2.719-1.909-1.257-0.131 1.005 370 b[6] 2.047 1.077 0.147 1.310 1.933 2.716 4.456 1.002 1100 b[7] 3.765 1.763 1.023 2.482 3.593 4.811 7.515 1.000 1200 b[8] 3.782 1.661 1.133 2.591 3.570 4.703 7.621 1.003 640 b[9] -2.049 1.106-4.439-2.745-1.948-1.255-0.218 1.004 470 b[10] -2.028 1.066-4.340-2.655-1.902-1.314-0.175 1.002 750... 2010-12-02 (2010 12 02 10 :51 ) 53/ 71 bugs post.bugs (2) print(post.bugs, digits.summary = 3) 95%

2010-12-02 (2010 12 02 10 :51 ) 54/ 71 mcmc.list post.list <- to.list(post.bugs) plot(post.list[,1:4,], smooth = F),

2010-12-02 (2010 12 02 10 :51 ) 55/ 71

2010-12-02 (2010 12 02 10 :51 ) 56/ 71 mcmc post.mcmc <- to.mcmc(post.bugs) matrix : 0 5 10 15 20 25 0 2 4 6 8

2.

2010-12-02 (2010 12 02 10 :51 ) 58/ 71 : abundance 0 5 10 15 20 25 ( ) 0 10 20 30 40 50 location :

2010-12-02 (2010 12 02 10 :51 ) 59/ 71 : abundance 0 5 10 15 20 25 0 10 20 30 40 50 location ( 95% )

β : β Normal(0, 10 2 ), 2010-12-02 (2010 12 02 10 :51 ) 60/ 71 1 2 3 4 i λ i : y i Poisson(λ i ) λ i ( ) + ( ) : log λ i = β + r i

2010-12-02 (2010 12 02 10 :51 ) 61/ 71 ( ) 1 2 3 4 Conditional Autoregressive (CAR) r i (N i i, J i i ): j J r i Normal( i r j, σ ) σ N i N i σ : τ = 1/σ Gamma(1.0 2, 1.0 2 ) p(β, {r i }, τ {y i }) = p({y i} β, {r i }, τ ) ( ( )dβdr1 dr 50 dτ

τ - τ (σ ) - τ (σ ) - τ ( ) abundance 0 5 10 15 20 25 tau = 1000 0 10 20 30 40 50 abundance 0 5 10 15 20 25 tau = 20 0 5 10 15 20 25 tau = 0.01 0 10 20 30 40 50 0 10 20 30 40 50 location location 2010-12-02 (2010 12 02 10 :51 ) 62/ 71

2010-12-02 (2010 12 02 10 :51 ) 63/ 71 BUGS model { # BUGS for (i in 1:N.site) { Y[i] ~ dpois(mean[i]) # log(mean[i]) <- beta + re[i] # ( ) + ( ) } # re[i] CAR model re[1:n.site] ~ car.normal(adj[], Weights[], Num[], tau) beta ~ dnorm(0, 1.0E-2) # tau ~ dgamma(1.0e-2, 1.0E-2) # } 1 2 3 4

2010-12-02 (2010 12 02 10 :51 ) 64/ 71 abundance 0 5 10 15 20 25 β 0.0 0.5 1.0 1.5 2.0 2.5 3.0 beta τ 0 10 20 30 40 50 location 0 10 20 30 40 50 tau ( 95% )

2010-12-02 (2010 12 02 10 :51 ) 65/ 71 abundance 0 5 10 15 20 25 0 10 20 30 40 50 location GLMM OK?

2010-12-02 (2010 12 02 10 :51 ) 66/ 71 vs abundance 0 5 10 15 20 25 abundance 0 5 10 15 20 25 0 10 20 30 40 50 location 0 10 20 30 40 50 location?

2010-12-02 (2010 12 02 10 :51 ) 67/ 71 ( ):?!! abundance 0 5 10 15 20 25 0 10 20 30 40 50 location (

2010-12-02 (2010 12 02 10 :51 ) 68/ 71 abundance 0 5 10 15 20 25 abundance 0 5 10 15 20 25 0 10 20 30 40 50 location 0 10 20 30 40 50 location!

2010-12-02 (2010 12 02 10 :51 ) 69/ 71 abundance 0 5 10 15 20 25 abundance 0 5 10 15 20 25 0 10 20 30 40 50 0 10 20 30 40 50 location location CAR

2010-12-02 (2010 12 02 10 :51 ) 70/ 71 : abundance 0 5 10 15 20 25 abundance 0 5 10 15 20 25 0 10 20 30 40 50 location 0 10 20 30 40 50 location

2010-12-02 (2010 12 02 10 :51 ) 71/ 71 : 1. : GLMM 2.