k3 ( :07 ) 2 (A) k = 1 (B) k = 7 y x x 1 (k2)?? x y (A) GLM (k

Similar documents
講義のーと : データ解析のための統計モデリング. 第5回

kubostat2018d p.2 :? bod size x and fertilization f change seed number? : a statistical model for this example? i response variable seed number : { i

講義のーと : データ解析のための統計モデリング. 第3回

(2/24) : 1. R R R

kubostat2017c p (c) Poisson regression, a generalized linear model (GLM) : :

一般化線形 (混合) モデル (2) - ロジスティック回帰と GLMM

1 15 R Part : website:

講義のーと : データ解析のための統計モデリング. 第2回

kubostat2017e p.1 I 2017 (e) GLM logistic regression : : :02 1 N y count data or

1 Stata SEM LightStone 4 SEM 4.. Alan C. Acock, Discovering Structural Equation Modeling Using Stata, Revised Edition, Stata Press 3.

201711grade2.pdf

1 Stata SEM LightStone 3 2 SEM. 2., 2,. Alan C. Acock, Discovering Structural Equation Modeling Using Stata, Revised Edition, Stata Press.

Use R

12/1 ( ) GLM, R MCMC, WinBUGS 12/2 ( ) WinBUGS WinBUGS 12/2 ( ) : 12/3 ( ) :? ( :51 ) 2/ 71

こんにちは由美子です

こんにちは由美子です

10:30 12:00 P.G. vs vs vs 2

,, Poisson 3 3. t t y,, y n Nµ, σ 2 y i µ + ɛ i ɛ i N0, σ 2 E[y i ] µ * i y i x i y i α + βx i + ɛ i ɛ i N0, σ 2, α, β *3 y i E[y i ] α + βx i

kubostat2017b p.1 agenda I 2017 (b) probability distribution and maximum likelihood estimation :

Microsoft Word - 計量研修テキスト_第5版).doc

Microsoft Word - 計量研修テキスト_第5版).doc

1 環境統計学ぷらす 第 5 回 一般 ( 化 ) 線形混合モデル 高木俊 2013/11/21

kubo2017sep16a p.1 ( 1 ) : : :55 kubo ( ( 1 ) / 10

kubostat2017j p.2 CSV CSV (!) d2.csv d2.csv,, 286,0,A 85,0,B 378,1,A 148,1,B ( :27 ) 10/ 51 kubostat2017j (

DAA09

Microsoft Word - 計量研修テキスト_第5版).doc

σ t σ t σt nikkei HP nikkei4csv H R nikkei4<-readcsv("h:=y=ynikkei4csv",header=t) (1) nikkei header=t nikkei4csv 4 4 nikkei nikkei4<-dataframe(n

統計研修R分散分析(追加).indd

(3) 検定統計量の有意確率にもとづく仮説の採否データから有意確率 (significant probability, p 値 ) を求め 有意水準と照合する 有意確率とは データの分析によって得られた統計値が偶然おこる確率のこと あらかじめ設定した有意確率より低い場合は 帰無仮説を棄却して対立仮説

第13回:交差項を含む回帰・弾力性の推定

Stata 11 Stata ROC whitepaper mwp anova/oneway 3 mwp-042 kwallis Kruskal Wallis 28 mwp-045 ranksum/median / 31 mwp-047 roctab/roccomp ROC 34 mwp-050 s

現代日本論演習/比較現代日本論研究演習I「統計分析の基礎」


Stata11 whitepapers mwp-037 regress - regress regress. regress mpg weight foreign Source SS df MS Number of obs = 74 F(

分布


dvi

2 と入力すると以下のようになる > x1<-c(1.52,2,3.01,9,2,6.3,5,11.2) > y1<-c(4,0.21,-1.5,8,2,6,9.915,5.2) > cor(x1,y1) [1] > cor.test(x1,y1) Pearson's produ

浜松医科大学紀要

% 10%, 35%( 1029 ) p (a) 1 p 95% (b) 1 Std. Err. (c) p 40% 5% (d) p 1: STATA (1). prtesti One-sample test of pr

/ 55 2 : : (GLM) 1. 1/23 ( )? GLM? (GLM ) 2.! 1/25 ( ) ffset (GLM )

最小2乗法

2.1 R, ( ), Download R for Windows base. R ( ) R win.exe, 2.,.,.,. R > 3*5 # [1] 15 > c(19,76)+c(11,13)

第121回関東連合産科婦人科学会総会・学術集会 プログラム・抄録


R John Fox R R R Console library(rcmdr) Rcmdr R GUI Windows R R SDI *1 R Console R 1 2 Windows XP Windows * 2 R R Console R ˆ R

「産業上利用することができる発明」の審査の運用指針(案)

卒業論文

/22 R MCMC R R MCMC? 3. Gibbs sampler : kubo/

!!! 2!

インターネットを活用した経済分析 - フリーソフト Rを使おう

,.,.,,. [15],.,.,,., , 1., , 1., 1,., 1,,., 1. i

tokei01.dvi


と入力する すると最初の 25 行が表示される 1 行目は変数の名前であり 2 列目は企業番号 (1,,10),3 列目は西暦 (1935,,1954) を表している ( 他のパネルデータを分析する際もデ ータをこのように並べておかなくてはならない つまりまず i=1 を固定し i=1 の t に関

こんにちは由美子です

X X X Y R Y R Y R MCAR MAR MNAR Figure 1: MCAR, MAR, MNAR Y R X 1.2 Missing At Random (MAR) MAR MCAR MCAR Y X X Y MCAR 2 1 R X Y Table 1 3 IQ MCAR Y I

2 / 39

: (GLMM) (pseudo replication) ( ) ( ) & Markov Chain Monte Carlo (MCMC)? /30

塗装深み感の要因解析

α β *2 α α β β α = α 1 β = 1 β 2.2 α 0 β *3 2.3 * *2 *3 *4 (µ A ) (µ P ) (µ A > µ P ) 10 (µ A = µ P + 10) 15 (µ A = µ P +

第11回:線形回帰モデルのOLS推定

Transcription:

2012 11 01 k3 (2012-10-24 14:07 ) 1 6 3 (2012 11 01 k3) kubo@ees.hokudai.ac.jp web http://goo.gl/wijx2 web http://goo.gl/ufq2 1 3 2 : 4 3 AIC 6 4 7 5 8 6 : 9 7 11 8 12 8.1 (1)........ 13 8.2 (2) χ 2.................... 16 9 17 10 18

2012 11 01 k3 (2012-10-24 14:07 ) 2 (A) k = 1 (B) k = 7 y 2 4 6 8 10 12 14 2 4 6 8 10 12 14 7 8 9 10 11 12 x 7 8 9 10 11 12 x 1 (k2)?? x y (A) GLM (k = 1) (B) x 6 GLM (k = 7) 1 1 2 GLM 1 (A) log λ = β 1, k = 1 1 (B) x 6 (log λ = β 1 + β 2 x + + β 7 x 6, k = 7) model selection AIC AIC AIC AIC GLM AIC 1 2 R 2 (k2) 100

2012 11 01 k3 (2012-10-24 14:07 ) 3 (A) k = 1 6 7 8 9 10 (B) f k = 2 6 7 8 9 10 6 7 8 9 10 7 8 9 10 11 12 (C) x k = 2 7 8 9 10 11 12 6 7 8 9 10 7 8 9 10 11 12 (D) x + f k = 3 7 8 9 10 11 12 2 4 x λ k (A) (B) f : (C) x : (D) x + f : 1 100 y i?? R glm() 2 x i x ; 2 C;?? f i f ; 2 B;?? x + f ; 2 D;?? 3 2 ( ; 2 A 3 ; 2 ) λ exp β 1 β 1 4 maximum log likelihood 3 1 (A)

2012 11 01 k3 (2012-10-24 14:07 ) 4 2 : deviance 4 1 R glm() GLM log L({β j }) log L log L log L D = 2 log L log L -2 5 λ i x i λ i = exp(β 1 + β 2 x i ) x 2 C x log L -235.4 (D = 2 log L ) 470.8 1 log L D 2 log L D D D Null D Null D D glm()... Null Deviance: 89.51 Residual Deviance: 84.99 AIC: 474.8 470.8 Null Deviance, Residual Deviance, AIC 3 residual deviance D ( ) R full model 100 100 4 5 deviance -2 χ 2

2012 11 01 k3 (2012-10-24 14:07 ) 5 475.3 470.8 x Deviance 2 log L () 89.5 (Null Deviance) 85.0 (Residual Deviance) 385.8 3 deviance deviance (null deviance) (residual deviance) y i y i = {6, 6, 6, 12, 10, } 100 i {1, 2, 3} y i 6 {λ 1, λ 2, λ 3 } = {6, 6, 6} i = 4 y 4 12 λ 4 = 12 i = 5 y 5 10 λ 5 = 10...... 100 log L 6 > sum(log(dpois(d$y, lambda = d$y))) [1] -192.8898 D = 2 log L = 385.8 100 x D ( D) = 470.8 385.8 = 85.0 glm() Residual Deviance: 84.99 AIC: 474.8 Residual Deviance 385.8 6 R dpois() 100 {y i } {λ i } = {y 1, y 2, y 3, } i log L = 0

2012 11 01 k3 (2012-10-24 14:07 ) 6 3 7 λ i = exp(β 1 ) β 1 k = 1 R null model 8 log λ i = β 1 2 A R glm() glm(y ~ 1,...) > fit.null <- glm(formula = y ~ 1, family = poisson, data = d) fit.null β 1 2.06 9 Degrees of Freedom: 99 Total (i.e. Null); Null Deviance: 89.51 Residual Deviance: 89.51 AIC: 477.3 99 Residual 89.5 > loglik(fit.null) log Lik. -237.6432 (df=1) 475.3 D 385.8 89.5 k log L D = 2 log L 2 log L D 2 k 10 3 AIC 2 11 7 GLM 8 null hypothesis 9 1 (A) 10 f 2 11??

2012 11 01 k3 (2012-10-24 14:07 ) 7 2 k log L Deviance Residual deviance 2 k log L Deviance Residual 2 log L deviance 1-237.6 475.3 89.5 f 2-237.6 475.3 89.5 x 2-235.4 470.8 85.0 x + f 3-235.3 470.6 84.8 100-192.9 385.8 0.0 3 2 AIC k log L Deviance Residual 2 log L deviance AIC 1-237.6 475.3 89.5 477.3 f 2-237.6 475.3 89.5 479.3 x 2-235.4 470.8 85.0 474.8 x + f 3-235.3 470.6 84.8 476.6 100-192.9 385.8 0.0 585.8 model selection model selection criterion AIC (Akaike s information criterion) AIC goodness of fitgoodness of prediction 12 k AIC AIC = 2 { } = 2(log L k) = D + 2k AIC 2 AIC 3 x AIC 13 AIC 4 statistical test 12 13?? GLM R stepaic()

2012 11 01 k3 (2012-10-24 14:07 ) 8 14 AIC likelihood ratio test 15 16 17 parametric 18 19 5 4 20 21 null hypothesis alternative hypothesis 14 15?? 16 17 most powerful test 18 19 20 10 21 AIC

2012 11 01 k3 (2012-10-24 14:07 ) 9 AIC ( ) ( ) 4 AIC 4 test statistic 22 95% 5% significant level 23 Neyman-Pearson 24 6 : 5 λ = exp(β 1 + β 2 x i ) GLM x : λ i x i β 2 = 0; k = 1 x : λ i x i β 2 0; k = 2 22 23 24 Neyman-Pearson

2012 11 01 k3 (2012-10-24 14:07 ) 10 (A) (B) i y i x i yi 2 4 6 8 10 12 14 x = 470.8 = 475.3 7 8 9 10 11 12 x i 5 (A) f i (B) 100 x x i ; x x i 4 x AIC 3 k log L Deviance Residual 2 log L deviance AIC 1-237.6 475.3 89.5 477.3 x 2-235.4 470.8 85.0 474.8 100-192.9 385.8 0.0 585.8 5 4 25 475.3 x 470.8 4.5 26 likelihood ratio : L 1 L 2 = : exp( 237.6) x : exp( 235.4) -2 D 1,2 = 2 (log L 1 log L 2) 27 D 1 = 2 log L 1 D 2 = 2 log L 2 D 1,2 = D 1 D 2 D 1,2 x x D 1,2 = 4.5 x 4.5 4.5 25 26 27 2 (A) (C) 3-2 D 1,2 χ 2 8.2

2012 11 01 k3 (2012-10-24 14:07 ) 11 5 D 1,2 ( ) ( ) () () 7 5 Neyman-Pearson : k = 1, β 2 = 0 : x k = 2, β 2 0 28 Neyman-Pearson 9 5 : D 1,2 = 4.5 x β 2 0 type I error : x D 1,2 = 4.5 x β 2 = 0 type II error Neyman-Pearson 1 2 ˆβ 1 = 2.06 (p.6 ) 3 β 2 = 0(k = 1) β 2 0(k = 2) D 1,2 D 1,2 6 28 x alternative hypothesis

2012 11 01 k3 (2012-10-24 14:07 ) 12 ( ˆβ 1 = 2.06 ) x D 1,2 D 1,2 D 1,2 D 1,2 7 9 11 13 x 7 9 11 13 7 9 11 13 7 9 11 13 6 D 1,2 ˆβ 1 = 2.06, p.6 D 1,2 4 x D 1,2 4.5 P D 1,2 = 4.5 29 8 x D 1,2 4.5 P P P value P P : D 1,2 = 4.5 P : D 1,2 = 4.5 x! P Neyman-Pearson α 30 : P α : P < α : α α = 0.0520 1 31 29 9 30 P α 31 20 1

2012 11 01 k3 (2012-10-24 14:07 ) 13 8.1 (1) P D 1,2 4.5 P P parametric bootstrap 32 χ 2 (PB) 6 R glm() x fit1 fit2 fit1 fit2 > fit2$deviance [1] 84.993 x x D 1,2 > fit1$deviance - fit2$deviance [1] 4.5139 D 1,2 4.5 7.85 33 7.85 100 rpois() 100 > d$y.rnd <- rpois(100, lambda = mean(d$y)) mean(d$y) 7.85 glm() x 32 33 glm() ˆβ 1 = 2.06 exp(2.06) = 7.846 mean(d$y)

2012 11 01 k3 (2012-10-24 14:07 ) 14 > fit1 <- glm(y.rnd ~ 1, data = d, family = poisson) > fit2 <- glm(y.rnd ~ x, data = d, family = poisson) > fit1$deviance - fit2$deviance [1] 1.920331 x i 1.92 x : 1 mean(d$y) d$y.rnd 2 d$y.rnd,x glm() fit1, fit2 3 fit1$deviance - fit2$deviance PB 1 1000 D 1,2 34 PB R pb() 35 get.dd <- function(d) # { n.sample <- nrow(d) # y.mean <- mean(d$y) # d$y.rnd <- rpois(n.sample, lambda = y.mean) fit1 <- glm(y.rnd ~ 1, data = d, family = poisson) fit2 <- glm(y.rnd ~ x, data = d, family = poisson) fit1$deviance - fit2$deviance # } pb <- function(d, n.bootstrap) { sapply(1:n.bootstrap, get.dd, d) } pb.r 36 R R pb.r pb() 34 35 36 bootstrap method fit1 fit2$null.deviance - fit2$deviance D 1,2 web ( )

2012 11 01 k3 (2012-10-24 14:07 ) 15 0 50 150 250 350 D 1,2 = 4.5 0 5 10 15 20 x D 1,2 7 D 1,2 D 1,2 1000 x D 1,2 = 4.5 > source("pb.r") # pb.r > dd12 <- pb(d, n.bootstrap = 1000) R D 1,2 1000 37 dd12 summary() > summary(dd12) Min. 1st Qu. Median Mean 3rd Qu. Max. 7.229e-08 8.879e-02 4.752e-01 1.025e+00 1.339e+00 1.987e+01 7 D 1,2 4.5 > hist(dd12, 100) > abline(v = 4.5, lty = 2) 1000 D 1,2 4.5 > sum(dd12 >= 4.5) [1] 38 1000 38 4.5 4.5 38 / 1000 P = 0.038 P = 0.05 D 1,2 38 37 D 1,2 10 3 n.bootstrap 10 4 38 P = α D 1,2 (critical point) D 1,2 (critical region rejection region)

> quantile(dd12, 0.95) 95% 3.953957 2012 11 01 k3 (2012-10-24 14:07 ) 16 5% D 1,2 3.95 4.5 P 0.038 39 0.05 significantly different 40 x 8.2 (2) χ 2 PB 7 41 fit1 fit2 x > fit1 <- glm(y ~ 1, data = d, family = poisson) > fit2 <- glm(y ~ x, data = d, family = poisson) anova() 42 > anova(fit1, fit2, test = "Chisq") Analysis of Deviance Table Model 1: y ~ 1 Model 2: y ~ x Resid. Df Resid. Dev Df Deviance P(> Chi ) 1 99 89.507 2 98 84.993 1 4.54 0.034 D 1,2 43 1 χ 2 χ 2 distribution 39?? 40 P Neyman-Pearson P < α 41 42 anova() ANOVA analysis of variance analysis of deviance 43 x

2012 11 01 k3 (2012-10-24 14:07 ) 17 "Chisq" χ 2 D 1,2 4.5 P 0.034 P PB P = 0.038 χ 2 100 "Chisq" P PB 44 χ 2 t F 9 α = 0.05 D 1,2 P < α P α fail to reject Neyman-Pearson 45 Neyman-Pearson 7 P < α P α 5 P 2 46 Neyman-Pearson P P 2 P 2 1 P 2 ; power 47 44 PB 45 46 β β 1 P 2 47

2012 11 01 k3 (2012-10-24 14:07 ) 18 10 AIC AIC Neyman-Pearson AIC 1.000001 P < α AIC P 48 48 effect size