kubostat2017b p.1 agenda I 2017 (b) probabilit distribution and maimum likelihood estimation kubo@ees.hokudai.ac.jp http://goo.gl/76c4i 2017 11 14 : 2017 11 07 15:43 1 : 2 3? 4 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 1 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 2 / 42 statistaical models appeared in the class The development of linear models Hierarchical Baesian Model Be more fleible Generalized Linear Mied Model (GLMM) Incoporating random effects such as individualit parameter estimation MCMC MLE Generalized Linear Model (GLM) Alwas normal distribution? That's non-sense! MSE Linear model Kubo Doctrine: Learn the evolution of linear-model famil, firstl! kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 3 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 4 / 42 suppose that ou have a count data set... 0, 1, 2 the normal distribution... is NOT this one!? ( {0, 1, 2, 3, } ) response variable e.g. egg number () e.g. bod size () eplanator variable? kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 5 / 42 response variable? 0? NO! eplanator variable kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 6 / 42
kubostat2017b p.2 the Poisson disribution approimates data?! response variable eplanator variable fair distribution non-negative mean YES! Plot our data and observe it Choose proper distributons! the normal distributon is NOT good at everthing be-be, the normal distribution kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 7 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 8 / 42 : 2 : : 2012 05 18 http://goo.gl/ufq2 2. : R kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 9 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 10 / 42 : a simplified data set, eas to understand number of seeds taken from 50 imaginar plants : number of seeds per plant individual () individual i i 50 i {1, 2, 3,, 50} seed number of i i { i }! { i } = { 1, 2,, 50 } : { i } R > data [1] 2 2 4 6 4 5 2 3 1 2 0 4 3 3 3 3 4 2 7 2 4 3 3 3 4 [26] 3 7 5 3 1 7 6 4 6 5 2 4 7 2 2 6 2 4 5 4 5 1 3 2 3 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 11 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 12 / 42
kubostat2017b p.3 : : R: a free statistical software : R http://www.r-project.org/ RStudio OS free software S RStudio http://www.rstudio.com/ kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 13 / 42 http://www.rstudio.com/ kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 14 / 42 : appl table() to categorize data R : start with data plotting, alwas table() > table(data) 0 1 2 3 4 5 6 7 1 3 11 12 10 5 4 4 ( 5 5 6 4 ) > hist(data, breaks = seq(-0.5, 9.5, 1))! kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 15 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 16 / 42 : How to evaluate mean value using R? : statistics to represent dispersion > mean(data) [1] 3.56 > abline(v = mean(data))! kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 17 / 42 : > var(data) [1] 2.9861 sample standard deviation > sd(data) [1] 1.7280 > sqrt(var(data)) [1] 1.7280 sample variance (SD = variance) : : kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 18 / 42
kubostat2017b p.4 Empirical VS Theoretical Distributions 3. kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 19 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 20 / 42 empirical distribution > data.table <- table(factor(data, levels = 0:10)) > cbind( = data.table, prob = data.table / 50) prob 0 1 0.02 1 3 0.06 2 11 0.22 3 12 0.24 4 10 0.20 5 5 0.10 6 4 0.08 7 4 0.08 8 0 0.00 9 0 0.00 10 0 0.00 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 21 / 42? = {0, 1, 2, }? {p 0, p 1, p 2,, p 99, p 100, }? kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 22 / 42? Mathematical epression of the Poisson distribution () ()! probabilit p( λ) = λ ep( λ)! factorial 4! 1 2 3 4 ep( λ) = e λ (e = 2.718 ) kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 23 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 24 / 42
kubostat2017b p.5 the Poisson distribution? R (λ) 3.56 Poisson distribution > <- 0:9 # () > prob <- dpois(, lambda = 3.56) # > plot(, prob, tpe = "b", lt = 2) > # cbind > cbind(, prob) prob 1 0 0.02843882 2 1 0.10124222 3 2 0.18021114 4 3 0.21385056 5 4 0.19032700 6 5 0.13551282 7 6 0.08040427 8 7 0.04089132 9 8 0.01819664 10 9 0.00719778 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 25 / 42 the Poisson distribution represent data? > hist(data, seq(-0.5, 8.5, 0.5)) # > lines(, prob, tpe = "b", lt = 2) # kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 26 / 42 parameter λ is the mean of the Poisson distribution λ The Poisson distribution is useful if...? λ λ λ 0 : λ = = > # cbind > cbind(, prob) prob 1 0 0.02843882 2 1 0.10124222 3 2 0.18021114 4 3 0.21385056 5 4 0.19032700 6 5 0.13551282 7 6 0.08040427 8 7 0.04089132 9 8 0.01819664 10 9 0.00719778 {0, 1, 2,, } 1 p( λ) = 1 =0 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 27 / 42 : i {0, 1, 2, } count data i mean variance kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 28 / 42 λ changes the shape of distribution λ? p( λ) = λ ep( λ)! mean λ kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 29 / 42 4.!? fitting = parameter estimation kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 30 / 42
kubostat2017b p.6 (likelihood)??? likelihood L(λ) depends on the value of mean, λ L(λ) λ maimum likelihood estimation λ goodness of fit λ 3 { 1, 2, 3 } = {2, 2, 4} 0.180 0.180 0.19 = 0.006156 (the likelihood definition for the eample): L(λ) = ( 1 2 ) ( 2 2 ) ( 50 3 ) = p( 1 λ) p( 2 λ) p( 3 λ) p( 50 λ) = p( i λ) = λ i ep( λ), i i i! kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 31 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 32 / 42? evaluate not likelihood, but log likelihood!? λ changes the log likelihood, i.e., goodness of fit λ () (!) (log likelihood function) log L(λ) = i ( ) i i log λ λ log k k log L(λ) L(λ) λ kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 33 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 34 / 42 seek the maimum likelihood estimate, ˆλ ˆλ? log L(λ) = ( i i log λ λ i k log k) log likelihood -120-110 -100 * ˆλ = 3.56 2.0 2.5 3.0 3.5 4.0 4.5 5.0 (ML estimator): i i/50 d log L dλ (ML estimate): ˆλ = 3.56 = 0!? no one knows the true λ based on finite size data λ λ 3.5 0 100 300 2.5 3.0 3.5 4.0 4.5 ˆλ 50 3000 ˆλ λ 50 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 35 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 36 / 42
kubostat2017b p.7 : random number generation estimation 5. λ = 3.5 ˆλ = 3.56 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 37 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 38 / 42 prediction probabilit distributions appeared in the class λ = 3.5 ˆλ = 3.56 : ( : {0, 1, 2, 3, } : {0, 1, 2,, N} N : < < kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 39 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 40 / 42 GLMM 線形モデルの発展 推定計算方法階層ベイズモデル (HBM) MCMC もっと自由な統計モデリングを! 一般化線形混合モデル (GLMM) 最尤推定法個体差 場所差といった変量効果をあつかいたい一般化線形モデル (GLM) 正規分布以外の確率分布をあつかいたい 最小二乗法線形モデル The net topic YES! : Poisson Regression, a Generalized Linear Model (GLM) kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 41 / 42 kubostat2017b (http://goo.gl/76c4i) 2017 (b) 2017 11 14 42 / 42