mosaic Daniel Kaplan * 1 Nicholas J. Horton * 2 Randall Pruim * 3 Macalester College Amherst College Calvin College St. Paul, MN Amherst, MA Grand Rap

Similar documents
R John Fox R R R Console library(rcmdr) Rcmdr R GUI Windows R R SDI *1 R Console R 1 2 Windows XP Windows * 2 R R Console R ˆ R

R Console >R ˆ 2 ˆ 2 ˆ Graphics Device 1 Rcmdr R Console R R Rcmdr Rcmdr Fox, 2007 Fox and Carvalho, 2012 R R 2

(lm) lm AIC 2 / 1

DAA09

Use R

k2 ( :35 ) ( k2) (GLM) web web 1 :

_先端融合開発専攻_観音0314PDF用

1 2 Windows 7 *3 Windows * 4 R R Console R R Console ˆ R GUI R R R *5 R 2 R R R 6.1 ˆ 2 ˆ 2 ˆ Graphics Device 1 Rcmdr R Console R Rconsole R --sdi R M

BMIdata.txt DT DT <- read.table("bmidata.txt") DT head(dt) names(dt) str(dt)

untitled

1 Stata SEM LightStone 4 SEM 4.. Alan C. Acock, Discovering Structural Equation Modeling Using Stata, Revised Edition, Stata Press 3.

1 15 R Part : website:

k3 ( :07 ) 2 (A) k = 1 (B) k = 7 y x x 1 (k2)?? x y (A) GLM (k

講義のーと : データ解析のための統計モデリング. 第5回

201711grade2.pdf

1 Stata SEM LightStone 3 2 SEM. 2., 2,. Alan C. Acock, Discovering Structural Equation Modeling Using Stata, Revised Edition, Stata Press.

²¾ÁÛ¾õ¶·É¾²ÁË¡¤Î¤¿¤á¤Î¥Ñ¥Ã¥±¡¼¥¸DCchoice ¡Ê»ÃÄêÈÇ¡Ë

DAA01

インターネットを活用した経済分析 - フリーソフト Rを使おう

講義のーと : データ解析のための統計モデリング. 第3回

統計研修R分散分析(追加).indd

こんにちは由美子です

p0124_03

!!! 2!

untitled


kubostat2018d p.2 :? bod size x and fertilization f change seed number? : a statistical model for this example? i response variable seed number : { i

情報管理学科で学ぶ

> usdata01 と打ち込んでエンター キーを押すと V1 V2 V : : : : のように表示され 読み込まれていることがわかる ここで V1, V2, V3 は R が列のデータに自 動的につけた変数名である ( variable

A comparative study of the team strengths calculated by mathematical and statistical methods and points and winning rate of the Tokyo Big6 Baseball Le

R分散分析06.indd

x y 1 x 1 y 1 2 x 2 y 2 3 x 3 y 3... x ( ) 2

untitled


Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI

kubostat2015e p.2 how to specify Poisson regression model, a GLM GLM how to specify model, a GLM GLM logistic probability distribution Poisson distrib

kubostat2017c p (c) Poisson regression, a generalized linear model (GLM) : :

10

untitled

Stata 11 Stata ROC whitepaper mwp anova/oneway 3 mwp-042 kwallis Kruskal Wallis 28 mwp-045 ranksum/median / 31 mwp-047 roctab/roccomp ROC 34 mwp-050 s

八戸大学紀要_45_表1pdf

untitled

toukei4.dvi


X X X Y R Y R Y R MCAR MAR MNAR Figure 1: MCAR, MAR, MNAR Y R X 1.2 Missing At Random (MAR) MAR MCAR MCAR Y X X Y MCAR 2 1 R X Y Table 1 3 IQ MCAR Y I

28

211 ‚æ2fiúŒÚ

1 R Windows R 1.1 R The R project web R web Download [CRAN] CRAN Mirrors Japan Download and Install R [Windows 9

2. S 2 ɛ 3. ˆβ S 2 ɛ (n p 1)S 2 ɛ χ 2 n p 1 Z N(0, 1) S 2 χ 2 n T = Z/ S 2 /n n t- Z T = S2 /n t- n ( ) (n+1)/2 Γ((n + 1)/2) f(t) = 1 + t2 nπγ(n/2) n

dvi

.3 ˆβ1 = S, S ˆβ0 = ȳ ˆβ1 S = (β0 + β1i i) β0 β1 S = (i β0 β1i) = 0 β0 S = (i β0 β1i)i = 0 β1 β0, β1 ȳ β0 β1 = 0, (i ȳ β1(i ))i = 0 {(i ȳ)(i ) β1(i ))

4.9 Hausman Test Time Fixed Effects Model vs Time Random Effects Model Two-way Fixed Effects Model


最小2乗法

DAA12

aisatu.pdf

Sigma

Sigma

Microsoft Word - 計量研修テキスト_第5版).doc


INDEX

1 kawaguchi p.1/81

現代日本論演習/比較現代日本論研究演習I「統計分析の基礎」

Power Transformation and Its Modifications Toshimitsu HAMASAKI, Tatsuya ISOMURA, Megu OHTAKI and Masashi GOTO Key words : identity transformation, pow

望月宏ゼミナール三年次進級論文

橡00扉.PDF

Stata 11 Stata ts (ARMA) ARCH/GARCH whitepaper mwp 3 mwp-083 arch ARCH 11 mwp-051 arch postestimation 27 mwp-056 arima ARMA 35 mwp-003 arima postestim


1 環境統計学ぷらす 第 5 回 一般 ( 化 ) 線形混合モデル 高木俊 2013/11/21

% 10%, 35%( 1029 ) p (a) 1 p 95% (b) 1 Std. Err. (c) p 40% 5% (d) p 1: STATA (1). prtesti One-sample test of pr

kubostat2017e p.1 I 2017 (e) GLM logistic regression : : :02 1 N y count data or

Stata11 whitepapers mwp-037 regress - regress regress. regress mpg weight foreign Source SS df MS Number of obs = 74 F(

untitled

要旨 1. 始めに PCA 2. 不偏分散, 分散, 共分散 N N 49

Validation of a Food Frequency Questionnaire Based on Food Groups for Estimating Individual Nutrient Intake Keiko Takahashi *', Yukio Yoshimura *', Ta

Rによる計量分析:データ解析と可視化 - 第3回 Rの基礎とデータ操作・管理

2016 Institute of Statistical Research

産業・企業レベルデータで見た日本の経済成長.pdf

kubostat2017b p.1 agenda I 2017 (b) probability distribution and maximum likelihood estimation :

イギリス教育改革の変遷

chisq.test corresp plot

こんにちは由美子です

* Meso- -scale Features of the Tokai Heavy Rainfall in September 2000 Shin-ichi SUZUKI Disaster Prevention Research Group, National R

「住宅に関する防犯上の指針」案

Stepwise Chow Test * Chow Test Chow Test Stepwise Chow Test Stepwise Chow Test Stepwise Chow Test Riddell Riddell first step second step sub-step Step

2015-s6-4g-pocket-guidebook_H1-4.indd

SWU Weekend Seminar June 2018 平成 30 年度マイケル ハリントン博士公開講座のお知らせ 昭和女子大学大学院文学研究科 文学言語学専攻主任 横山紀子 オーストラリアの名門大学であるクイーンズランド大学のマイケル ハリントン博士 ( 昭和女子大学客員教授 ) による下記の

一般化線形 (混合) モデル (2) - ロジスティック回帰と GLMM

1 2 *3 Windows 7 *4 Windows * 5 R R Console R R Console ˆ R GUI R R R *6 R 2 R R R 6.1 ˆ 2 ˆ 2 ˆ Graphics Device R R Rcmdr Rconsole R --sdi R MDI R *3

ヒストリカル法によるバリュー・アット・リスクの計測:市場価格変動の非定常性への実務的対応

udc-2.dvi

p.1/22

<4D F736F F D20939D8C7689F090CD985F93C18EEA8D758B E646F63>


Characteristics of WPPSI Intelligence Test Profiles of Hearing-Impaired Children Tsutomu Uchiyama, Ryoko Ijuin and Hiroko Tokumitsu Abstract: We analy

1. 2 Blank and Winnick (1953) 1 Smith (1974) Shilling et al. (1987) Shilling et al. (1987) Frew and Jud (1988) James Shilling Voith (1992) (Shilling e

Fig. 1. Example of characters superimposed on delivery slip.

(X) (Y ) Y = intercept + c X + e (1) e c c M = intercept + ax + e (2) a Y = intercept + cx + bm + e (3) (1) X c c c (3) b X M Y (indirect effect) a b

R による統計解析入門

y i OLS [0, 1] OLS x i = (1, x 1,i,, x k,i ) β = (β 0, β 1,, β k ) G ( x i β) 1 G i 1 π i π i P {y i = 1 x i } = G (

Transcription:

mosaic Daniel Kaplan * 1 Nicholas J. Horton * 2 Randall Pruim * 3 Macalester College Amherst College Calvin College St. Paul, MN Amherst, MA Grand Rapids, MI 2013 8 17 1 1 2 3 2.1 R RStudio....................................... 3 2.2............................................ 3 3 Lock 4 4 12 4.1................................ 16 4.2...................................... 17 4.3............................... 18 5 20 6 22 7 22 1 mosaic R mosaic *1 dtkaplan@gmail.com *2 rpruim@calvin.edu *3 nhorton@amherst.edu 1

ˆ ˆ ˆ ˆ ˆ George Cobb 3 Rs Randomization Replication Rejection Cobb 2007 3R Terry Speed 2001 Robin Lock USCOTS United States Conference on Teaching Statistics) 2011 http://www.causeweb.org/ uscots/breakout/breakout3_6.php mosaic 1 MOSAIC www.mosaic-web.org Efron and Tibshirani, 1993; Hesterberg et al 2005 2 2 Lock mosaic Lock 2

mosaic 2 2.1 R RStudio R R RStudio http://www.rstudio.org R 2.2 mosaic R install.packages("mosaic") require(mosaic) options(digits = 3) R mosaic R R read.csv() R URL URL mosaic fetchdata() fetchdata() Lock Lock 1 mustangs <- fetchdata("mustangprice.csv") 3

3 Lock Lock Robin Lock 2011 Lock 1 Mustang MustangPrice.csv 1000 1000 1000 90% R 2 lattice 1. mosaic 2. histogram(~price, data = mustangs) lattice ~ data= mosaic mean(~price, data = mustangs) [1] 16 R 4

mean(mustangs$price) mean(~price, data = mustangs) redample() simple = c(1,2,3,4,5) resample(simple) [1] 2 1 1 5 4 resample(simple) [1] 1 3 5 3 1 resample(simple) [1] 5 1 3 5 2 resample() resample(mustangs) Age Miles Price orig.ids 10 1 1.1 37.9 10 20 14 102.0 8.2 20 19 12 117.4 7.0 19 25 14 115.1 4.9 25 19.1 12 117.4 7.0 19 6 15 111.0 10.0 6... and so on 19 2 1 5

mean(~price, data=resample(mustangs)) [1] 20.5 1 mean(~price, data = resample(mustangs)) [1] 12.3 5 do(5) * mean(~price, data = resample(mustangs)) $result [1] 14.6 15.5 16.1 17.9 14.5 attr(,"row.names") [1] 1 2 3 4 5 attr(,"class") [1] "do.data.frame" 1000 trials trials <- do(1000) * mean(~price, data = resample(mustangs)) histogram(~ result, data = trials, xlab = " ") confint(trials, level = 0.90, method = "quantile") name 5 % 95 % 1 result 12.5 19.5 6

confint(trials, level = 0.90, method = "stderr") name lower upper 1 result 12.4 19.5 confint() confint() 2 ˆ 90% qdata(c(.05,.95), result, data = trials) 5% 95% 12.5 19.5 ˆ t t z 90% 0.95 tstar <- qt(.95 df = 24) zstar <- qnorm(0.95) tstar * sd(~result, data = trials) [1] 3.68 zstar * sd(~result, data = trials) [1] 3.54 7

confint() level method quantile stderr R mosaic Lock 2 NFL National Football League NFL 1974 2009 428 240 NFL 428 240 428 428/2=214 240 1 2 428 240 prop(rbinom(100000, prob=0.5, size=428) >= 240) TRUE 0.00699 240 prop(rbinom(100000, prob=0.5, size=428) >= 240) TRUE 0.00655 240 1 8

pbinom(239, prob=0.5, size=428) [1] 0.993 2 mosaic do(1) * rflip(428) $n [1] 428 $heads [1] 206 $tails [1] 222 attr(, "row.names") [1] attr(,"class") [1] "do.data.frame" 1000 428 240 trials <- do(1000) * rflip(428) prop(trials$heads >= 240, data=trials) TRUE 0.009 histogram(~heads, groups = (heads >= 240), data = trials) 240 groups = Lock 3 2 Mednicj et al, 2008 24 2 12 1 1.5 1 9

2 sleep <- fetchdata("sleepcaffeine.csv") The Sleep group seems to have remembered somewhat more words on average: mean(words ~ Group, data=sleep) Caffeine Sleep 12.2 15.2 obs <- diff(mean(words ~ Group, data=sleep)) obs Sleep 3 bwplot(words ~ Group, data=sleep) Words Group 10

diff(mean(words ~ shuffle(group), data = sleep)) Sleep -1.17 1 diff(mean(words ~ shuffle(group), data=sleep)) Sleep 0.333 Lock 4 1 1000 95% cor(price, Miles, data = mustangs) [1] -0.825 trials <- do(1000) * cor(price, Miles, data = mustangs) quantiles <- qdata(c(.025,.975), result, data = trials) 2.5% 97.5% -0.928-0.720 histogram(~result, data = trials, groups=cut(result, c(-inf, quantiles, Inf)), nbin = 30) trials <- do(1000) * diff(mean(words ~ shuffle(group), data = sleep)) histogram(~ Sleep, groups=(sleep >= obs), data=trials, xlab=" \n ") p p 1000 11

35 p 0.035 2 4 Lock Lock Mustang xyplot(price ~ Miles, data = mustangs) Lock 12

R lm() lm(price ~ Miles, data = mustangs) Call: lm(formula = Price ~ Miles, data = mustangs) Coefficients: (Intercept) Miles 30.495-0.219 1000 219 0.2188 Mustang 1 22 1 22 0.82 Mustang mean(price, data = mustangs ) [1] 16 lm(price ~ 1, data = mustangs) Call: lm(formula = Price ~ 1, data = mustangs) Coefficients: (Intercept) 16 lm() ~ Price ~ Miles Price Miles Price ~ 1 1 mean() 13

mean(price ~ 1, data = mustangs) 1 16 Lock sleep mean(words ~ 1, data=sleep ) 1 13.8 mean(words ~ Group, data = sleep ) Caffeine Sleep 12.2 15.2 Mustang Miles sleep Group lm(words ~ Group, data = sleep ) Call: lm(formula = Words ~ Group, data = sleep) Coefficients: (Intercept) GroupSleep 12.2 3.0 mean() lm() lm() GroupSleep 2 14

diff(mean(words ~ Group, data = sleep )) Sleep 3 lm() 1 HELPrct prop(homeless ~ 1, data = HELPrct) homeless: 0.461 prop(homeless ~ sex, data = HELPrct ) homeless:female homeless:male 0.374 0.488 2 diff(prop(homeless ~ sex, data = HELPrct )) homeless:male 0.115 lm() homeless housed lm(homeless == "homeless" ~ 1, data = HELPrct ) Call: lm(formula = homeless == "homeless" ~ 1, data = HELPrct) Coefficients: (Intercept) 15

0.461 mean() prop() diff() lm() lm() lm() lm() mean() prop() ~1 lm(homeless == "homeless" ~ sex, data = HELPrct) Call: lm(formula = homeless == "homeless" ~ sex, data = HELPrct) Coefficients: (Intercept) sexmale 0.374 0.115 4.1 lm() mean() prop() Mustang trials <- do(1000) * lm(price ~ Miles, data = resample(mustangs)) confint(trials) name lower upper 1 Intercept 24.889 36.034 2 Miles -0.277-0.163 3 Sigma 2.969 9.033 4 r.squared 0.518 0.884 HELPrct nulldist <- do(1000) * lm(homeless == "homeless" ~ shuffle(sex), data=helprct) prop(~ abs(sexmale) > 0.1146, data = nulldist) TRUE 0.059 16

4.2 Mustangs Age Miles trialsmod1 <- do(1000) * lm(price ~ Age, data = resample(mustangs)) trialsmod2 <- do(1000) * lm(price ~ Miles, data = resample(mustangs)) trialsmod3 <- do(1000) * lm(price ~ Miles + Age, data = resample(mustangs)) 1 Price 1 1000 2000 confint(trialsmod1) name lower upper 1 Intercept 23.78 37.150 2 Age -2.29-1.181 3 Sigma 4.01 11.395 4 r.squared 0.27 0.756 2 1 16 28 confint(trialsmod2) name lower upper 1 Intercept 24.576 36.325 2 Miles -0.279-0.159 3 Sigma 2.809 9.059 4 r.squared 0.520 0.891 3 Age Miles confint(trialsmod3) name lower upper 1 Intercept 25.453 36.0534 2 Miles -0.323-0.0948 3 Age -0.973 0.7159 4 Sigma 2.684 9.1336 5 r.squared 0.520 0.9100 17

1 Miles Age Miles Age 4.3 1 Mustangs Price Miles Age 1 anova(lm(price ~ Miles + Age, data = mustangs)) Analysis of Variance Table Response: Price Df Sum Sq Mean Sq F value Pr(>F) Miles 1 2016 2016 46.94 7e-07 *** Age 1 4 4 0.09 0.77 Residuals 22 945 43 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Age p Age anova(lm(price ~ Age + Miles, data = mustangs)) Analysis of Variance Table Response: Price Df Sum Sq Mean Sq F value Pr(>F) Age 1 1454 1454 33.9 7.4e-06 *** Miles 1 565 565 13.2 0.0015 ** Residuals 22 945 43 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 2 ANOVA R 2 Age Miles 18

do(1) * lm(price ~ Miles, data = mustangs) $Intercept [1] 30.5 $Miles [1] -0.219 $Sigma [1] 6.42 $r.squared [1] 0.68 attr(,"row.names") [1] 1 attr(,"class") [1] "do.data.frame" do(1) * lm(price ~ Miles + Age, data = mustangs) $Intercept [1] 30.9 $Miles [1] -0.205 $Age [1]-0.155 $Sigma [1] 6.55 $r.squared [1] 0.68 attr(,"row.names") [1] 1 attr(,"class") [1] "do.data.frame" do() Age R 2 0.680 0.681 Age R 2 19

trials1 <- do(1000) * lm(price ~ Miles + shuffle(age), data = mustangs ) confint(trials1) name lower upper 1 Intercept 25.704 35.451 2 Miles -0.232-0.206 3 Age -0.581 0.565 4 Sigma 6.059 6.787 5 r.squared 0.660 0.727 Price ~ Miles + Age R 2 Age Miles trials2 <- do(1000) * lm(price ~ shuffle(miles) + Age, data = mustangs ) confint(trials2) name lower upper 1 Intercept 24.7016 35.631 2 Miles -0.0772 0.081 3 Age -1.8770-1.563 4 Sigma 7.6286 8.571 5 r.squared 0.4580 0.567 R 2 = 0.681 Miles 5 p t F HELPrct homeless sex χ 2 p 1. chisq.test(tally( ~ homeless + sex, data = HELPrct, margins = FALSE)) Yates Pearson data: tally(~homeless + sex, data = HELPrct, margins = FALSE) X-squared = 3.87, df = 1, p-value = 0.04913 20

2. p pval(chisq.test(tally( ~ homeless + sex, data = HELPrct, margins = FALSE))) p.value p 0.0491 3. pval(chisq.test(tally( ~ shuffle(homeless) + sex, data=helprct, margins=false))) p.value p 0.976 4. trials = do(1000)* pval(chisq.test( tally( ~ shuffle(homeless) + sex, data=helprct, margins=false))) p 0.05 p 0 1 p < 0.05 5% prop(~(p.value < 0.05), data=trials) TRUE 0.052 histogram( ~p.value, data=trials, width = 0.05) 21

χ 2 age trials = do(1000) * glm(homeless=="homeless" ~ age + sex, data = resample(helprct), family = "binomial") confint(trials) name lower upper 1 Intercept -2.363803-0.3915 2 age -0.000953 0.0482 3 sexmale 0.035213 0.9432 6 Sarah Anoke USCOTS St. Lawrence Robin Lock MOSAIC US National Science Foundation DUE-0920350 MOSAIC www.mosaic-web.org 7 ˆ G. W. Cobb, The introductory statistics course: a Ptolemaic curriculum?, Technology Innovations in Statistics Education, 2007, 1(1). ˆ B. Efron & R. J. Tibshirani, An Introduction to the Bootstrap, 1993, Chapman & Hall, New York. ˆ T. Hesterberg, D. S. Moore, S. Monaghan, A. Clipson & R. Epstein. Bootstrap Methods and Permutation Tests (2nd edition), (2005), W.H. Freeman, New York. ˆ D.T. Kaplan, Statistical Modeling: A Fresh Approach, 2nd edition, http://www. mosaic-web.org/statisticalmodeling. ˆ S.C. Mednicj, D. J. Cai, J. Kanady, S. P. Drummond. Comparing the benefits of caffeine, naps and placebo on verbal, motor and perceptual memory, Behavioural Brain Research, 2008, 193(1):79-86. ˆ T. Speed, Simulation, IMS Bulletin, 2011, 40(3):18. 22