.. ( ) (2) GLMM kubo@ees.hokudai.ac.jp I http://goo.gl/rrhzey 2013 08 27 : 2013 08 27 08:29 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 1 / 74
I.1 N k.2 binomial distribution logit link function.3.4! offset.5 GLM!.6.7 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 2 / 74
II.8 r i.9 GLMM (A) f i =C (B) f i =T yi 0 2 4 6 8 0 2 4 6 8 8 9 10 11 12 x i 8 9 10 11 12 x i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 3 / 74
6 7 : : 2012 05 18 http://goo.gl/ufq2 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 4 / 74
kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 5 / 74
? (GLM) (Poisson regression) (logistic regression) (linear regression) kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 6 / 74
(GLM)??? kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 7 / 74
GLM : : e.g., β 1 + β 2 x i : -2 0 2 4 6 0.5 1.0 1.5 2.0 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 8 / 74
GLM logistic : : e.g., β 1 + β 2 x i : logit yi 0 2 4 6 8 8 9 10 11 12 x i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 9 / 74
N k 1. N k y i {0, 1, 2,, 8} kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 10 / 74
N k? 8 y! f i C: T: i N i = 8 y i = 3 (alive) (dead) x i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 11 / 74
N k data4a.csv CSV (comma separated value) format file R : > d <- read.csv("data4a.csv") or > d <- read.csv( + "http://hosho.ees.hokudai.ac.jp/~kubo/stat/2013/fig/binomial/data4a.csv") d data frame ( ) kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 12 / 74
N k data frame d > summary(d) N y x f Min. :8 Min. :0.00 Min. : 7.660 C:50 1st Qu.:8 1st Qu.:3.00 1st Qu.: 9.338 T:50 Median :8 Median :6.00 Median : 9.965 Mean :8 Mean :5.08 Mean : 9.967 3rd Qu.:8 3rd Qu.:8.00 3rd Qu.:10.770 Max. :8 Max. :8.00 Max. :12.440 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 13 / 74
N k > plot(d$x, d$y, pch = c(21, 19)[d$f]) > legend("topleft", legend = c("c", "T"), pch = c(21, 19)) yi 0 2 4 6 8 C T 8 9 10 11 12 x i? kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 14 / 74
binomial distribution logit link function 2. binomial distribution logit link function kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 15 / 74
binomial distribution logit link function : N y ( N y p(y N, q) = ( ) N q y (1 q) N y y ) N y p(y i 8, q) 0.0 0.1 0.2 0.3 0.4 q = 0.1 q = 0.8 q = 0.3 0 2 4 6 8 y i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 16 / 74
binomial distribution logit link function (z i : e.g. z i = β 1 + β 2 x i ) 1 q i = logistic(z i ) = 1 + exp( z i ) > logistic <- function(z) 1 / (1 + exp(-z)) # > z <- seq(-6, 6, 0.1) > plot(z, logistic(z), type = "l") q 0.0 0.2 0.4 0.6 0.8 1.0 q = -6-4 -2 0 2 4 6 z 1 1+exp( z) kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 17 / 74
binomial distribution logit link function {β 1, β 2 } = {0, 2} (A) β 2 = 2 β 1 (B) β 1 = 0 β 2 q 0.0 0.2 0.4 0.6 0.8 1.0 (A) β 2 = 2 β 1 = 2 β 1 = 0-3 -2-1 0 1 2 3 x β 1 = 3 0.0 0.2 0.4 0.6 0.8 1.0 (B) β 1 = 0 β 2 = 4 β 2 = 2-3 -2-1 0 1 2 3 x β 2 = 1 {β 1, β 2 } x q 0 q 1 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 18 / 74
binomial distribution logit link function logit link function logistic 1 q = 1 + exp( (β 1 + β 2 x)) = logistic(β 1 + β 2 x) logit logit(q) = log q 1 q = β 1 + β 2 x logit logistic logistic logit logit is the inverse function of logistic function, vice versa kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 19 / 74
binomial distribution logit link function R β 1 β 2 (A) f i =C (B) y 0 2 4 6 8 x 7 8 9 10 11 12 0 2 4 6 8 x 7 8 9 10 11 12 > glm(cbind(y, N - y) ~ x + f, data = d, family = binomial)... Coefficients: (Intercept) x ft -19.536 1.952 2.022 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 20 / 74
binomial distribution logit link function : (A) f i =C (B) f i =T yi 0 2 4 6 8 0 2 4 6 8 8 9 10 11 12 x i 8 9 10 11 12 x i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 21 / 74
3. kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 22 / 74
? logit(q) = log q 1 q = β 1 + β 2 x + β 3 f + β 4 xf... in case that β 4 < 0, sometimes it predicts... y 0 2 4 6 8 T C 8 9 10 11 12 x kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 23 / 74
glm(y ~ x + f,...) glm(y ~ x + f + x:f,...) (A) (B) y 0 2 4 6 8 T C 0 2 4 6 8 T C 8 9 10 11 12 x 8 9 10 11 12 x kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 24 / 74
! offset 4.! offset kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 25 / 74
! offset? / : 10 3 200 60 3? ( ) kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 26 / 74
! offset : N k : : specific leaf area (SLA) : offset! kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 27 / 74
! offset kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 28 / 74
! offset offset : x {0.1, 0.2,, 1.0} 10 glm(..., family = poisson) kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 29 / 74
! offset?! x A = /! glm() offset kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 30 / 74
! offset R data.frame: Area, x, y > load("d2.rdata") > head(d, 8) # 8 Area x y 1 0.017249 0.5 0 2 1.217732 0.3 1 3 0.208422 0.4 0 4 2.256265 0.1 0 5 0.794061 0.7 1 6 0.396763 0.1 1 7 1.428059 0.6 1 8 0.791420 0.3 1 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 31 / 74
! offset vs > plot(d$x, d$y / d$area) d$y/d$area 0 5 10 15 0.2 0.4 0.6 0.8 1.0 d$x kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 32 / 74
! offset A vs y > plot(d$area, d$y) d$y 0 5 10 15 0.0 1.0 2.0 3.0 d$area A y kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 33 / 74
! offset x ( ) > plot(d$area, d$y, cex = d$x * 2) d$y 0 5 10 15 0.0 1.0 2.0 3.0 d$area? kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 34 / 74
! offset x y x kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 35 / 74
! offset = 1. i y i λ i : y i Pois(λ i ) 2. λ i A i x i λ i = A i exp(β 1 + β 2 x i ) λ i = exp(β 1 + β 2 x i + log(a i )) log(λ i ) = β 1 + β 2 x i + log(a i ) log(a i ) offset ( β ) kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 36 / 74
! offset GLM! family: poisson, link : "log" : y ~ x offset : log(area) z = β 1 + β 2 x + log(area) a, b λ log(λ) = z λ = exp(z) = exp(β 1 + β 2 x + log(area)) λ : kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 37 / 74
! offset glm() kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 38 / 74
! offset R glm() > fit <- glm(y ~ x, family = poisson(link = "log"), data = d, offset = log(area)) > print(summary(fit)) Call: glm(formula = y ~ x, family = poisson(link = "log"), data = d, offset = log(area)) (......) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 0.321 0.160 2.01 0.044 x 1.090 0.227 4.80 1.6e-06 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 39 / 74
! offset d$y 0 5 10 15 x = 0.9 light environment x = 0.1 dark environment 0.0 1.0 2.0 3.0 d$area glm() kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 40 / 74
! offset : glm() offset offset = exp( ) 0.0 1.0 2.0 3.0 d$y 0 5 10 15 d$area kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 41 / 74
! offset : N k : : specific leaf area (SLA) : offset! kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 42 / 74
GLM! 5. GLM! (overdispersion)? kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 43 / 74
GLM! :?! (A) i N i = 8 (B) 100 x i y i y i = 3 x i {2, 3, 4, 5, 6} yi 0 2 4 6 8 2 3 4 5 6 x i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 44 / 74
GLM! N y? number of alive seeds y i : : β 1 + β 2 x i : logit 0 2 4 6 8 2 3 4 5 6 number of leaves x i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 45 / 74
GLM! GLM! yi 0 2 4 6 8 (A) β 2 (B)! x i = 4 y i 0 1 2 3 4 5 6 2 3 4 5 6 x i 0 2 4 6 8 y i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 46 / 74
6.? kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 47 / 74
(overdispersion)? (A) Not or less overdispersed 0 5 10 15 (B) Overdispersed!! 0 2 4 6 8 0 2 4 6 8 y i y i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 48 / 74 0 5 10 15
GLM GLM does not take into account individual differences kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 49 / 74
Almost all real data are overdispersed! kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 50 / 74
7. kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 51 / 74
number of alive seeds y i : : β 1 + β 2 x i + r i : logit 0 2 4 6 8 2 3 4 5 6 number of leaves x i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 52 / 74
i r i qi 0.0 0.2 0.4 0.6 0.8 1.0 r i > 0 r i = 0 r i < 0 2 3 4 5 6 x i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 53 / 74
{r i } s = 1.0 s = 1.5-6 -4-2 0 2 4 6 r i s = 3.0 p(r i s) = 1 2πs 2 exp ( r2 i 2s 2 p(r i s) r i r i r i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 54 / 74 )
r i (A) (B) p(r i s) s = 0.5 50 {r i } s = 3.0 I III I II I I -6-4 -2 0 2 4 6 r i II I II I II I I I III I II I I I I I -6-4 -2 0 2 4 6 r i 1 q i = 1+exp( r i ) 0 5 10 15 0 2 4 6 8 y i 2.9 9.9 p(y i q i ) 0 5 10 15 0 2 4 6 8 y i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 55 / 74
> # defining logistic function > logistic <- function(z) { 1 / (1 + exp(-z)) } > # random numbers following binomial distribution > rbinom(100, 8, prob = logistic(0)) > # random numbers following Gausssian distribution > rnorm(100, mu = 0, sd = 0.5) > r <- rnorm(100, mu = 0, sd = 0.5) > # random numbers following...? > rbinom(100, 8, prob = logistic(0 + r)) kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 56 / 74
Generalized Linear Mixed Model (GLMM) Mixed : β 1 + β 2 x i + r i fixed effects: β 1 + β 2 x i random effects: +r i fixed? random?? kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 57 / 74
: fixed effects random effects kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 58 / 74
: (linear mixed model, LMM) random effects : : kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 59 / 74
global parameter, local parameter? Generalized Linear Mixed Model (GLMM) : β 1 + β 2 x i + r i fixed effects: β 1 + β 2 x i global parameter s global parameter random effects: +r i local parameter i ( ) global/local parameter kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 60 / 74
全データ 個体個体 3 3 のデータのデータ個体 1 のデータ個体個体 3 3 のデータのデータ個体 2 のデータ {r 1, r 2, r 3,..., r 100 } β 1 β 2 local parameter global parameter? kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 61 / 74 s
r i 8. r i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 62 / 74
r i r i local parameters: {r 1, r 2,, r 100 } 100 r i > d <- read.csv("data.csv") > head(d) N y x id 1 8 0 2 1 2 8 1 2 2 3 8 2 2 3 4 8 4 2 4 5 8 1 2 5 6 8 0 2 6 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 63 / 74
r i r i y i ( ) 8 p(y i β 1, β 2 ) = q yi i (1 q i) 8 y i r i i r i L i = p(r i s) = β 1, β 2, s y i ) 1 ( exp r2 i 2πs 2 2s 2 p(y i β 1, β 2, r i ) p(r i s)dr i L(β 1, β 2, s) = i L i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 64 / 74
r i global parameter local parameter Generalized Linear Mixed Model (GLMM) Mixed : β 1 + β 2 x i + r i global parameter fixed effects: β 1, β 2 : s local parameter random effects: {r 1, r 2,, r 100 } kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 65 / 74
r i r i kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 66 / 74
r i r r = 2.20 q = 0.10 y 0 2 4 6 8 r = 0.60 q = 0.35 y 0 2 4 6 8 r = 1.00 q = 0.73 y 0 2 4 6 8 r = 2.60 q = 0.93 y 0 2 4 6 8..... r p(r s) p(r) = 0.10 r -5 0 5 p(r) = 0.13 r -5 0 5 p(r) = 0.13 r -5 0 5 p(r) = 0.09 r -5 0 5 0 2 4 6 8 y kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 67 / 74
r i r r = 1.10 λ = 0.55 y 0 2 4 6 8 10 r = 0.30 λ = 1.22 y 0 2 4 6 8 10 r = 0.50 λ = 2.72 y 0 2 4 6 8 10 r = 1.30 λ = 6.05 y 0 2 4 6 8 10..... r p(r s) p(r) = 0.22 r -2-1 0 1 2 p(r) = 0.38 r -2-1 0 1 2 p(r) = 0.35 r -2-1 0 1 2 p(r) = 0.17 r -2-1 0 1 2 0 2 4 6 8 10 y kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 68 / 74
r i glmmml package GLMM > install.packages("glmmml") # if you don t have glmmml > library(glmmml) > glmmml(cbind(y, N - y) ~ x, data = d, family = binomial, + cluster = id) > d <- read.csv("data.csv") > head(d) N y x id 1 8 0 2 1 2 8 1 2 2 3 8 2 2 3 4 8 4 2 4 5 8 1 2 5 6 8 0 2 6 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 69 / 74
r i GLMM : ˆβ1, ˆβ 2, ŝ > glmmml(cbind(y, N - y) ~ x, data = d, family = binomial, + cluster = id)...(snip)... coef se(coef) z Pr(> z ) (Intercept) -4.13 0.906-4.56 5.1e-06 x 0.99 0.214 4.62 3.8e-06 Scale parameter in mixing distribution: 2.49 gaussian Std. Error: 0.309 Residual deviance: 264 on 97 degrees of freedom AIC: 270 ˆβ 1 = 4.13, ˆβ 2 = 0.99, ŝ = 2.49 kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 70 / 74
r i GLMM (A) (B) x = 4 yi 0 2 4 6 8 0 1 2 3 4 5 6 2 3 4 5 6 0 2 4 6 8 x i y kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 71 / 74
GLMM 9. GLMM kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 72 / 74
GLMM + GLMM I (A) pot A pot A pot B pot B (B) logitq i = β 1 + β 2 x i (GLM) q i : logitq i = β 1 + β 2 x i + r i (A) (B) kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 73 / 74
GLMM + GLMM II (C) pot A pot B logitq i = β 1 + β 2 x i + r j (D) pot A pot B logitq i = β 1 + β 2 x i + r i + r j kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 74 / 74
GLMM GLMM random effects global parameter local parameter GLMM global parameter local parameter local parameter (e.g. + ) kubostat2013ou2 (http://goo.gl/rrhzey) ( ) (2) 2013 08 27 75 / 74