kubostat207e p. I 207 (e) GLM kubo@ees.hokudai.ac.jp https://goo.gl/z9ycjy 207 4 207 6:02 N y 2 binomial distribution logit link function 3 4! offset kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 2 / 47 statistaical models appeared in the class 6 GLM : 202 05 8 http://goo.gl/ufq2 The development of linear models Hierarchical Bayesian Model Be more flexible Generalized Linear Mixed Model (GLMM) Incoporating random effects such as individuality parameter estimation MCMC MLE Generalized Linear Model (GLM) Always normal distribution? That's non-sense! MSE Linear model Kubo Doctrine: Learn the evolution of linear-model family, firstly! kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 3 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 4 / 47? how to specify GLM Generalized Linear Model (GLM) (Poisson regression) (logistic regression) (linear regression) Generalized Linear Model (GLM) probability distribution? linear predictor? link function? kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 5 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 6 / 47
kubostat207e p.2 how to specify Poisson regression model, a GLM GLM how to specify model, a GLM GLM logistic probability distribution Poisson distribution : linear predictor e.g., β + β 2 x i link function log link function -2 0 2 4 6 0.5.0.5 2.0 probability distribution binomial distribution : linear predictor e.g., β + β 2 x i link function logit yi x i kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 7 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 8 / 47 N y N y?. N y seeds alive 8 y! y i {0,, 2,, 8} f i C: T: i N i = 8 y i = 3 (alive) (dead) x i kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 9 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 0 / 47 N y Reading data file N y data frame d data4a.csv CSV (comma separated value) format file R > d <- read.csv("data4a.csv") or > d <- read.csv( + "http://hosho.ees.hokudai.ac.jp/~kubo/stat/205/fig/binomial/data4a.csv") d data frame ( ) > summary(d) N y x f Min. :8 Min. :0.00 Min. : 7.660 C:50 st Qu.:8 st Qu.:3.00 st Qu.: 9.338 T:50 Median :8 Median :6.00 Median : 9.965 Mean :8 Mean :5.08 Mean : 9.967 3rd Qu.:8 3rd Qu.:8.00 3rd Qu.:0.770 Max. :8 Max. :8.00 Max. :2.440 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 2 / 47
kubostat207e p.3 N y binomial distribution logit link function > plot(d$x, d$y, pch = c(2, 9)[d$f]) > legend("topleft", legend = c("c", "T"), pch = c(2, 9)) yi 2. binomial distribution logit link function x i fertilization effective? kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 3 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 4 / 47 binomial distribution logit link function binomial distribution : N y p(y N, q) = ( ) N q y ( q) N y y ( N ) y N y p(y i 8, q) 0.0 0. 0.2 0.3 0.4 q = 0. q = 0.3 q = 0.8 y i kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 5 / 47 binomial distribution logit link function logistic curve (z i: q i = logistic(z i ) = linear predictor e.g. z i = β + β 2x i) + exp( z i ) > logistic <- function(z) / ( + exp(-z)) # > z <- seq(-6, 6, 0.) > plot(z, logistic(z), type = "l") q 0.0 0.2 0.4 0.6 0.8.0 q = +exp( z) -6-4 -2 0 2 4 6 z kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 6 / 47 binomial distribution logit link function β and β 2 change logistic curve logit link function binomial distribution logit link function {β, β 2 } = {0, 2} (A) β 2 = 2 β (B) β = 0 β 2 q 0.0 0.2 0.4 0.6 0.8.0 (A) β 2 = 2 β = 2 β = 0-3 -2-0 2 3 x β = 3 0.0 0.2 0.4 0.6 0.8.0 (B) β = 0 β 2 = 4 β 2 = 2-3 -2-0 2 3 x β 2 = {β, β 2 } x q 0 q kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 7 / 47 logistic q = + exp( (β + β 2 x)) = logistic(β + β 2 x) logit q logit(q) = log q = β + β 2 x logit logistic logistic logit logit is the inverse function of logistic function, vice versa kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 8 / 47
kubostat207e p.4 binomial distribution logit link function MLE for β and β 2 R β β 2 binomial distribution logit link function y (A) f i =C x (B) > glm(cbind(y, N - y) ~ x + f, data = d, family = binomial)... Coefficients: (Intercept) x ft -9.536.952 2.022 x yi (A) f i =C x i (B) f i =T x i kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 9 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 20 / 47? 3. q logit(q) = log q = β + β 2 x + β 3 f + β 4 xf... in case that β 4 < 0, sometimes it predicts... y T C 8 9 0 2 x kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 2 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 22 / 47 in today s example no interaction effect! offset ^^I glm(y ~ x + f,...) glm(y ~ x + f + x:f,...) (A) (B) 4.! y T C T C offset 8 9 0 2 x 8 9 0 2 x little difference kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 23 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 24 / 47
kubostat207e p.5! offset?! How to avoid data/data? offset / : 0 3 200 60 3? ( ) avoidable data/data values probability : N y indices such as densities use statistical model with binomial distribution : specific leaf area (SLA) use offset term! described later offset! kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 25 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 26 / 47! unfortunately, sometimes fractions appear... offset! offset example population densities in research plots offset : hard to avoid... outputs from some measuring machines light intensity index x light index {0., 0.2,,.0} 0 sometimes we have no choice but plot data/data values... glm(..., family = poisson) kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 27 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 28 / 47! offset What? Differences in plot size?!?!! R data.frame: Area, offset light index number of plants x, y x A = /! glm() offset > load("d2.rdata") > head(d, 8) # 8 Area x y 0.07249 0.5 0 2.27732 0.3 3 0.208422 0.4 0 4 2.256265 0. 0 5 0.79406 0.7 6 0.396763 0. 7.428059 0.6 8 0.79420 0.3 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 29 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 30 / 47
kubostat207e p.6! offset! offset vs A vs y > plot(d$x, d$y / d$area) > plot(d$area, d$y) d$y/d$area 0 5 0 5 d$y 0 5 0 5 0.2 0.4 0.6 0.8.0 d$x 0.0.0 2.0 3.0 d$area kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 3 / 47 A y kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 32 / 47! offset! offset x () x > plot(d$area, d$y, cex = d$x * 2) d$y 0 5 0 5 0.0.0 2.0 3.0 d$area? kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 33 / 47 y x kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 34 / 47! offset! offset = GLM!. i y i λ i : y i Pois(λ i ) 2. λ i A i x i λ i = A i exp(β + β 2 x i ) λ i = exp(β + β 2 x i + log(a i )) log(λ i ) = β + β 2 x i + log(a i ) log(a i ) offset ( β ) family: poisson, link "log" : y ~ x offset : log(area) z = β + β 2 x + log(area) a, b λ log(λ) = z λ = exp(z) = exp(β + β 2 x + log(area)) λ : kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 35 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 36 / 47
kubostat207e p.7! offset! offset glm() R glm() > fit <- glm(y ~ x, family = poisson(link = "log"), data = d, offset = log(area)) > print(summary(fit)) Call: glm(formula = y ~ x, family = poisson(link = "log"), data = d, offset = log(area)) (......) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 0.32 0.60 2.0 0.044 x.090 0.227 4.80.6e-06 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 37 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 38 / 47! offset Plotting the model prediction based on estimation! offset : glm() offset d$y 0 5 0 5 x = 0.9 light environment x = 0. dark environment offset = exp( ) d$y 0 5 0 5 0.0.0 2.0 3.0 d$area 0.0.0 2.0 3.0 d$area solid lines prediction glm() dotted lines true model kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 39 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 40 / 47! offset Improve your statisitcal model and remove data/data values! avoidable data/data values probability : N y indices such as densities use statistical model with binomial distribution : specific leaf area (SLA) use offset term! Improve your statistical model! offset! kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 4 / 47! offset The next topic 0 2 3 4 5 6 y i N y Hierarchical Bayesian Model (HBM)? kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 42 / 47
kubostat207e p.8! offset! offset A preview of continuous probability distributions to construct Hierarchical Bayesian Models kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 43 / 47? discrete probability distributions?? continuous probability distributions? kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 44 / 47! offset! offset discrete probability distributions ( ) Poisson distribution λ changes the shape of distribution λ probability distribution, the core of statistical model Binomial distribution binomial distribution logit link function binomial distribution : N y Uniform distribution (continuous) an important device for HBM parameter: min (a) and max (b) p(y λ) = λy exp( λ) mean λ y! ( ) N p(y N, q) = q y ( q) N y y! ) N y ( N y q = 0. q = 0.8 q = 0.3 p(yi 8, q) yi 0.0 0. 0.2 0.3 0.4 f (x) b a kubostat207b (http://goo.gl/76c4i) 207 (b) 207 06 2 29 / 42 kubostat207e (http://goo.gl/76c4i) 207 (e) 207 06 2 5 / 47 0 a b x kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 45 / 47 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 46 / 47! offset the normal or Gaussian distribution an important device for HBM parameter: mean (µ) and SD (s > 0) (mean) µ = 0 Standard Deviation (SD) s s =.0 s =.5 s = 3.0 x ( ) p(x s) = exp x2 2πs 2 2s 2 kubostat207e (https://goo.gl/z9ycjy) 207 (e) 207 4 47 / 47