kubostat2015e p.1 I 2015 (e) GLM kubo@ees.hokudai.ac.jp http://goo.gl/76c4i 2015 07 22 2015 07 21 16:26 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 1 / 42 1 N k 2 binomial distribution logit link function 3 4 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 2 / 42 statistaical models appeared in the class 6 GLM : 2012 05 18 http://goo.gl/ufq2 The development of linear models Hierarchical Bayesian Model Be more flexible Generalized Linear Mixed Model (GLMM) Incoporating random effects such as individuality parameter estimation MCMC MLE Generalized Linear Model (GLM) Always normal distribution? That's non-sense! MSE Linear model Kubo Doctrine: Learn the evolution of linear-model family, firstly! kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 3 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 4 / 42? how to specify GLM Generalized Linear Model (GLM) (Poisson regression) (logistic regression) (linear regression) Generalized Linear Model (GLM) probability distribution?? link function? kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 5 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 6 / 42
kubostat2015e p.2 how to specify Poisson regression model, a GLM GLM how to specify model, a GLM GLM logistic probability distribution Poisson distribution : e.g., β 1 + β 2 x i link function log link function -2 0 2 4 6 0.5 1.0 1.5 2.0 probability distribution binomial distribution : e.g., β 1 + β 2 x i link function logit yi x i kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 7 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 8 / 42 N k N k? 1. N k seeds alive 8 y! y i {0, 1, 2,, 8} f i C: T: i N i = 8 y i = 3 (alive) (dead) x i kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 9 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 10 / 42 N k Reading data file N k data frame d data4a.csv CSV (comma separated value) format file R > d <- read.csv("data4a.csv") or > d <- read.csv( + "http://hosho.ees.hokudai.ac.jp/~kubo/stat/2015/fig/binomial/data4a.csv") d data frame ( ) > summary(d) N y x f Min. :8 Min. :0.00 Min. : 7.660 C:50 1st Qu.:8 1st Qu.:3.00 1st Qu.: 9.338 T:50 Median :8 Median :6.00 Median : 9.965 Mean :8 Mean :5.08 Mean : 9.967 3rd Qu.:8 3rd Qu.:8.00 3rd Qu.:10.770 Max. :8 Max. :8.00 Max. :12.440 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 11 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 12 / 42
kubostat2015e p.3 N k binomial distribution logit link function > plot(d$x,, pch = c(21, 19)[d$f]) > legend("topleft", legend = c("c", "T"), pch = c(21, 19)) yi C T 2. binomial distribution logit link function x i fertilization effective? kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 13 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 14 / 42 binomial distribution logit link function binomial distribution : N y p(y N, q) = ( ) N q y (1 q) N y y ( N ) y N y p(y i 8, q) 0.0 0.1 0.2 0.3 0.4 q = 0.1 q = 0.3 q = 0.8 y i kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 15 / 42 binomial distribution logit link function logistic curve (z i: q i = logistic(z i ) = e.g. z i = β 1 + β 2x i) 1 1 + exp( z i ) > logistic <- function(z) 1 / (1 + exp(-z)) # > z <- seq(-6, 6, 0.1) > plot(z, logistic(z), type = "l") q 0.0 0.2 0.4 0.6 0.8 1.0 1 q = 1+exp( z) -6-4 -2 0 2 4 6 z kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 16 / 42 binomial distribution logit link function β 1 and β 2 change logistic curve logit link function binomial distribution logit link function {β 1, β 2 } = {0, 2} (A) β 2 = 2 β 1 (B) β 1 = 0 β 2 q 0.0 0.2 0.4 0.6 0.8 1.0 (A) β 2 = 2 β 1 = 2 β 1 = 0-3 -2-1 0 1 2 3 x β 1 = 3 0.0 0.2 0.4 0.6 0.8 1.0 (B) β 1 = 0 β 2 = 4 β 2 = 2-3 -2-1 0 1 2 3 x β 2 = 1 {β 1, β 2 } x q 0 q 1 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 17 / 42 logistic 1 q = 1 + exp( (β 1 + β 2 x)) = logistic(β 1 + β 2 x) logit q logit(q) = log 1 q = β 1 + β 2 x logit logistic logistic logit logit is the inverse function of logistic function, vice versa kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 18 / 42
kubostat2015e p.4 binomial distribution logit link function MLE for β 1 and β 2 R β 1 β 2 binomial distribution logit link function (A) f i =C (B) y 7 x 7 > glm(cbind(y, N - y) ~ x + f, data = d, family = binomial)... Coefficients: (Intercept) x ft -19.536 1.952 2.022 x yi (A) f i =C x i (B) f i =T x i kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 19 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 20 / 42? 3. q logit(q) = log 1 q = β 1 + β 2 x + β 3 f + β 4 xf... in case that β 4 < 0, sometimes it predicts... y T C x kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 21 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 22 / 42 in today s example no interaction effect glm(y ~ x + f,...) glm(y ~ x + f + x:f,...) (A) (B) 4. y T C T C x x little difference kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 23 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 24 / 42
kubostat2015e p.5? How to avoid data/data? / : 10 3 200 60 3? ( ) avoidable data/data values probability : N k indices such as densities use statistical model with binomial distribution : specific leaf area (SLA) use offset term! described later offset! kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 25 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 26 / 42 unfortunately, sometimes fractions appear... example population densities in research plots offset : hard to avoid... outputs from some measuring machines light intensity index x light index {0.1, 0.2,, 1.0} 10 sometimes we have no choice but plot data/data values... glm(..., family = poisson) kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 27 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 28 / 42 What? Differences in plot size?!?! R data.frame: Area, light index number of plants x, y x A = / glm() offset > load("d2.rdata") > head(d, 8) # 8 Area x y 1 0.017249 0.5 0 2 1.217732 0.3 1 3 0.208422 0.4 0 4 2.256265 0.1 0 5 0.794061 0.7 1 6 0.396763 0.1 1 7 1.428059 0.6 1 8 0.791420 0.3 1 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 29 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 30 / 42
kubostat2015e p.6 vs A vs y > plot(d$x, / ) > plot(, ) / 0.2 0.4 0.6 0.8 1.0 d$x kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 31 / 42 A y kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 32 / 42 x () x > plot(,, cex = d$x * 2)? kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 33 / 42 y x kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 34 / 42 = GLM! 1. i y i λ i : y i Pois(λ i ) 2. λ i A i x i λ i = A i exp(β 1 + β 2 x i ) λ i = exp(β 1 + β 2 x i + log(a i )) log(λ i ) = β 1 + β 2 x i + log(a i ) log(a i ) offset ( β ) family: poisson, link "log" : y ~ x offset : log(area) z = β 1 + β 2 x + log(area) a, b λ log(λ) = z λ = exp(z) = exp(β 1 + β 2 x + log(area)) λ : kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 35 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 36 / 42
kubostat2015e p.7 glm() R glm() > fit <- glm(y ~ x, family = poisson(link = "log"), data = d, offset = log(area)) > print(summary(fit)) Call: glm(formula = y ~ x, family = poisson(link = "log"), data = d, offset = log(area)) (......) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 0.321 0.160 2.01 0.044 x 1.090 0.227 4.80 1.6e-06 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 37 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 38 / 42 Plotting the model prediction based on estimation : glm() offset x = 0.9 light environment x = 0.1 dark environment offset = exp( ) solid lines prediction glm() dotted lines true model kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 39 / 42 kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 40 / 42 Improve your statisitcal model and remove data/data values! avoidable data/data values probability : N k indices such as densities use statistical model with binomial distribution : specific leaf area (SLA) use offset term! Improve your statistical model! offset kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 41 / 42 yi (A) The next topic 0 1 2 3 4 5 6 (B) x i = 4 2 3 4 5 6 x i y i Generalized Linear Mixed Model (GLMM) kubostat2015e (http://goo.gl/76c4i) 2015 (e) 2015 07 22 42 / 42