Stata 11 whitepaper mwp 4 mwp mwp-028 / 41 mwp mwp mwp-079 functions 72 mwp-076 insheet 89 mwp-030 recode 94 mwp-033 reshape wide

PS001

Stata 11 whitepaper mwp 4 mwp-027 23 mwp-028 / 41 mwp-001 51 mwp-078 62 mwp-079 functions 72 mwp-076 insheet 89 mwp-030 recode 94 mwp-033 reshape wide/long 100 mwp-036 ivregress 110 mwp-082 logistic/logit 127 mwp-039 logistic/logit 137 mwp-040 margins 151 mwp-029 mlogit 167 mwp-090 ologit 184 mwp-088 poisson 199 mwp-087 regress 217 mwp-037 regress 235 mwp-038 anova/oneway 247 mwp-042 sdtest χ 2 F 272 mwp-043 ttest t 279 mwp-041 table 286 mwp-070 tabstat 294 mwp-071 tabulate 299 mwp-072 tabulate 305 mwp-073

StataCorp c 2011 Math c 2011 StataCorp LP Math web: www.math-koubou.jp email: master@math-koubou.jp

mwp-076 functions - Stata generate [D] functions [D] egen (mwp-077 ) 1. 1.1 1.2 2. 2.1 2.2 Running sum 3. 3.1 t 3.2 t 3.3 3.4 4. 4.1 5. 6. / 7. c Copyright Math c Copyright StataCorp LP (used with permission) 4

1. Stata [D] functions Random-number functions 1.1 runiform() [0, 1) (observations) 10 x 1 repeatability seed. set obs 10 obs was 0, now 10. set seed 2. generate x1 = runiform() * 1. list * 2 x1 1..850512 2..0515642 3..6303533 4..6991696 5..518447 6..3360431 7..1747266 8..9104601 9..3874338 10..409848 [a, a + b) a + b*runiform() [10, 20) x 2 *1 Data Create or change data Create new variable *2 Data Describe data List data 5

. generate x2 = 10 + 10*runiform(). list x2 x2 1. 19.59647 2. 11.52971 3. 16.53234 4. 14.25097 5. 14.2031 6. 17.27702 7. 14.25147 8. 18.96779 9. 10.89719 10. 12.13104 round() floor(). generate x3 = round(x2). list x2 x3 x2 x3 1. 19.59647 20 2. 11.52971 12 3. 16.53234 17 4. 14.25097 14 5. 14.2031 14 6. 17.27702 17 7. 14.25147 14 8. 18.96779 19 9. 10.89719 11 10. 12.13104 12 round() floor() runiform(). generate x4 = round(10 + 10*runiform()). list x4 x4 1. 16 2. 14 3. 20 4. 20 5. 15 6. 18 7. 15 8. 16 9. 18 10. 13 6

1.2 2. Stata [D] functions Mathematical functions 2.1 round() floor() ceil() int() 0 (1) x round(x) floor(x) ceil(x) int(x) x = 4.8 5 4 5 4 x = 5.2 5 5 6 5 int() floor() (2) x round(x) floor(x) ceil(x) int(x) x = 4.8 5 5 4 4 x = 5.2 5 6 5 5 int() ceil() 7

2.2 Running sum sum() running sum 1, 2,..., 10 x. clear. set obs 10. generate x = n * 3 x running sum. generate y1 = sum(x). list x y1 1. 1 1 2. 2 3 3. 3 6 4. 4 10 5. 5 15 6. 6 21 7. 7 28 8. 8 36 9. 9 45 10. 10 55 y 1 i j=1 x j (i = 1, 2,..., 10) egen total() 10 j=1 x j. egen y2 = total(x) * 4. list x y1 y2 1. 1 1 55 2. 2 3 55 3. 3 6 55 4. 4 10 55 5. 5 15 55 6. 6 21 55 7. 7 28 55 8. 8 36 55 9. 9 45 55 10. 10 55 55 *3 n [U] 13.4 System variables *4 Data Create or change data Create new variable (extended) 8

3. Stata [D] functions Probability distributions and density functions Student t 3.1 t Example fuel.dta t. use http://www.stata-press.com/data/r11/fuel, clear * 5 1 12 wide mpg1 mpg2. list, separator(0) mpg1 mpg2 1. 20 24 2. 23 25 3. 21 21 4. 25 22 5. 18 23 6. 17 18 7. 18 17 8. 24 28 9. 20 24 10. 24 27 11. 23 21 12. 19 23 mpg1 = mpg2 t 5% *5 File Example Datasets Stata 11 manual datasets Base Reference Manual [R] ttest 9

. ttest mpg1 == mpg2 * 6. ttest mpg1 == mpg2 Paired t test Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] mpg1 12 21.7881701 2.730301 19.26525 22.73475 mpg2 12 22.75.9384465 3.250874 20.68449 24.81551 diff 12 1.75.7797144 2.70101 3.46614.0338602 mean(diff) = mean(mpg1 mpg2) t = 2.2444 Ho: mean(diff) = 0 degrees of freedom = 11 Ha: mean(diff) < 0 Ha: mean(diff)!= 0 Ha: mean(diff) > 0 Pr(T < t) = 0.0232 Pr( T > t ) = 0.0463 Pr(T > t) = 0.9768 mpg1 mpg2 1.75 t 2.2444 11 p 0.0463 ttest [R] ttest (mwp-041 ) 3.2 t tden(n, t) t t n n 11. twoway (function tden(11, x), range(-4 4)), ytitle("") xtitle(t) > title(t(11) distribution) * 7 *6 Statistics Summaries, tables and tests Classical tests of hypotheses Mean-comparison test, paired data *7 Graphics Twoway graph (scatter, line, etc.) 10

ttest t > 2.2444 0.0463 3.3 3.4 4. Stata [D] functions Programming functions 4.1 / recode(), autocode(), irecode() age01.dta. use http://www.math-koubou.jp/stata/data11/age01.dta 11

. list, separator(0) age 1. 18 2. 21 3. 36 4. 44 5. 58 6. 65 7. 73 8. 82 (1) recode() recode() (x, x 1, x 2,..., x n ) x 1, x 2,..., x n x. generate code1a = recode(age, 20, 30, 40, 50, 60, 70, 80, 90). list age code1a, separator(0) age code1a 1. 18 20 2. 21 30 3. 36 40 4. 44 50 5. 58 60 6. 65 70 7. 73 80 8. 82 90 x 1, x 2,..., x n x > x n. generate code1b = recode(age, 25, 40, 60). list age code1b, separator(0) age code1b 1. 18 25 2. 21 25 3. 36 40 4. 44 60 5. 58 60 6. 65 60 7. 73 60 8. 82 60 x > x n x n 12

(2) autocode() (3) irecode() 5. Stata [D] functions String functions 6. / Data Editor 10/15/2011 01jan2011 01jan1960 daily() monthly() [D] functions Date and time functions mwp-001 7. Stata [D] functions Matrix functions returning a matrix Matrix functions returning a scalar 1 1 1 A = 1 2 2 1 2 3. matrix input A = (1,1,1\1,2,2\1,2,3). matrix list A symmetric A[3,3] c1 c2 c3 r1 1 r2 1 2 r3 1 2 3 A A inv() 13

. matrix B = inv(a). matrix list B symmetric B[3,3] r1 r2 r3 c1 2 c2 1 2 c3 0 1 1 A 1 = 2 1 0 1 2 1 0 1 1 A Cholesky cholesky(). matrix C = cholesky(a). matrix list C C[3,3] c1 c2 c3 r1 1 0 0 r2 1 1 0 r3 1 1 1 CC T. matrix D = C*C. matrix list D symmetric D[3,3] r1 r2 r3 r1 1 r2 1 2 r3 1 2 3 A = CC T 14

mwp-082 ivregress - OLS 1. OLS 2. OLS 3. 4. 5. 2 2 6. ivregress 6.1 6.2 2SLS 6.3 LIML 6.4 GMM 7. ivregress postestimation 7.1 estat endogenous 7.2 estat firststage 7.3 estat overid 1. OLS y i = β 0 + β 1 x i + u i, i = 1, 2,..., n (M1) β 0 β 1 2 (OLS: ordinary least squares) 4 c Copyright Math c Copyright StataCorp LP (used with permission) 15

x (fixed variable) u 0 E(u i ) = 0, i = 1, 2,..., n (M2) V (u i ) = E(u 2 i ) = σ 2, i = 1, 2,..., n (M3) Cov(u i, u j ) = E(u i u j ) = 0, i j, i, j = 1, 2,..., n (M4) OLS 2 (RSS: residual sum of squares) n n RSS = (y i ŷ i ) 2 = (y i ˆβ 0 ˆβ 1 x i ) 2 (M5) i=1 i=1 ˆβ 0, ˆβ 1 RSS ˆβ 0 RSS ˆβ 1 n = 2 (y i ˆβ 0 ˆβ 1 x i ) = 0 (M6a) i=1 n = 2 (y i ˆβ 0 ˆβ 1 x i )x i = 0 (M6b) i=1 ˆβ 0, ˆβ 1 n i=1 ˆβ 1 = (x n i x)(y i ȳ) i=1 n (= (x ) i x)y i=1 (x i i x) 2 n i=1 (x i x) 2 ˆβ 0 = ȳ ˆβ 1 x (M7a) (M7b) 2. OLS (1) (unbiasedness) θ ˆθ ˆθ E(ˆθ) θ E(ˆθ) = θ (M8) ˆθ (unbiased estimator) 1 4 OLS ˆβ 0, ˆβ 1 (BLUE: best linear unbiased estimator) [ Gauss-Markov ] 16

(2) (consistency) n ˆθ θ ϵ > 0 lim P ( ˆθ θ ϵ) = 0 n ˆθ θ (consistent estimator) ˆθ θ ˆθ θ OLS ˆβ 0, ˆβ 1 plim ˆθ = θ (M9) 3. 1 (1) x (M7a) y i ȳ = β 1 (x i x) + (u i ū) ˆβ 1 = n i=1 (x i x)(y i ȳ) n i=1 (x i x) 2 = β 1 + n i=1 (x i x)(u i ū) n i=1 (x i x) 2 (M10) [ n E[ ˆβ i=1 1 ] = β 1 + E (x ] i x)(u i ū) n i=1 (x i x) 2 (M11) 2 x u 0 x u 0 ˆβ 1 β 1 (2) [ n ] plim(v (x i )) = plim (x i x) 2 /n = σx 2 i=1 [ n ] plim(cov(x i, u i )) = plim (x i x)(u i ū)/n = σ xu i=1 17

(M10) [ n plim( ˆβ i=1 1 ) = plim β 1 + (x ] i x)(u i ū) n i=1 (x i x) 2 = β 1 + plim [ n i=1 (x ] i x)(u i ū)/n n i=1 (x i x) 2 /n = β 1 + plim [ n i=1 (x i x)(u i ū)/n] plim [ n i=1 (x i x) 2 /n] = β 1 + σ xu σ 2 x (M12) σ xu 0 x u ˆβ 1 β 1 (3) 2 ξ η η i = β 0 + β 1 ξ i + ϵ i, i = 1, 2,..., n η ξ x i y i y i = β 0 + β 1 x i + u i, i = 1, 2,..., n x i ξ i + v i v i u i = ϵ i β 1 v i Cov(x i, u i ) = Cov(ξ i + v i, ϵ i β 1 v i ) = β 1 σv 2 0 x u OLS (4) Y C I { Yi = C i + I i C i = β 0 + β 1 Y i + u i I (exogenous variable) Y i = β 0 1 β 1 + 1 1 β 1 I i + 1 1 β 1 u i C i = β 0 1 β 1 + β 1 1 β 1 I i + 1 1 β 1 u i Y 18

Y u [ ] u 2 Cov(Y i, u i ) = E i = σ2 0 1 β 1 1 β 1 (endogenous variable) OLS 4. 5. 2 2 6. ivregress 7. ivregress postestimation 19

mwp-037 regress - regress 1. 1.1 1.2 1.3 2. 3. 4. 5. 1. regress. regress mpg weight foreign Source SS df MS Number of obs = 74 F( 2, 71) = 69.75 Model 1619.2877 2 809.643849 Prob > F = 0.0000 Residual 824.171761 71 11.608053 R squared = 0.6627 Adj R squared = 0.6532 Total 2443.45946 73 33.4720474 Root MSE = 3.4071 mpg Coef. Std. Err. t P> t [95% Conf. Interval] weight.0065879.0006371 10.34 0.000.0078583.0053175 foreign 1.650029 1.075994 1.53 0.130 3.7955.4954422 _cons 41.6797 2.165547 19.25 0.000 37.36172 45.99768 c Copyright Math c Copyright StataCorp LP (used with permission) 20

Stata Example auto.dta mpg = β 0 + β 1 weight + β 2 foreign + ϵ Stata regress 1.1 regress1.dta 1,000 use http://www.math-koubou.jp/stata/data11/regress1.dta 5. list in 1/5 * 1 y x1 x2 x3 1. 25.4 49 3 3 2. 35.8 54 7 10 3. 36.2 73 9 6 4. 11.6 26 2 3 5. 16.8 45 8 3 y. generate y = 0.5*x1 + 2*x2-10 + rnormal(0, 10) x 1 x 2 rnormal(m, s) m s [D] functions y = 0.5x 1 + 2.0x 2 10 (1) x 1, x 2 x 3 y x 3 *1 Data Describe data List data 21

1.2 1,000 regress Statistics Linear models and related Linear regression Model : Dependent variable: y Independent variables: x1 x2 x3 1 regress - Model. regress y x1 x2 x3 Source SS df MS Number of obs = 1000 F( 3, 996) = 215.47 Model 63294.2507 3 21098.0836 Prob > F = 0.0000 Residual 97526.2879 996 97.9179598 R squared = 0.3936 Adj R squared = 0.3917 Total 160820.539 999 160.98152 Root MSE = 9.8954 y Coef. Std. Err. t P> t [95% Conf. Interval] x1.476147.0303301 15.70 0.000.4166288.5356652 x2 1.974713.1090362 18.11 0.000 1.760746 2.18868 x3.0072188.1083894 0.07 0.947.205479.2199165 _cons 8.906289 1.697041 5.25 0.000 12.23647 5.576103 (1) regress n(n 3) 2 rvfplot 22

. rvfplot, yline(0) * 2 (x 1i, x 2i,...) ŷ i y i y i ŷ i rvfplot (residual-versus-fitted plot) ŷ i 10 (2) R 2 regress regress ANOVA y SS (sum of squares) y i (i = 1,..., n) ȳ 2 y i ŷ i y (yi ȳ) 2 = (y i ŷ i ) 2 + (ŷ i ȳ) 2 (2) (yi ȳ) 2 TSS (total sum of squares) (ŷi ȳ) 2 MSS (model sum of squares) (yi ŷ i ) 2 RSS (residual sum of squares) TSS 160820.539 MSS 63294.2507 RSS 97526.2879 *2 Statistics Linear models and related Regression diagnostics Residual-versus-fitted plot 23

(coefficient of determination) R 2 = MSS TSS = 1 RSS TSS (3) 63294.2507/160820.539 = 0.3936 ANOVA R-squared 39% (3) p R-squared 1 p Prob > F (ANOVA: analysis of variance) F H 0 H 0 : β 1 = β 2 = = 0 β 0 (4) ANOVA Coef. (coefficients) β 1 = 0.48 β2 = 1.97 β3 = 0.01 β0 = 8.91 (1) (5) p p β j = 0 t p x 3 p 0.947 β 3 = 0 β 3 95% CI: confidence interval [ 0.21, 0.22] 0 x 3 x 1, x 2 p 0 regress p x 1 0.5 postestimation 1 test Statistics Postestimation Tests Test linear hypotheses Main : Test type for specification 1: Linear expressions are equal Specification 1, linear expression: x1 = 0.5 24

2 test - Main. test (x1 = 0.5) ( 1) x1 =.5 F( 1, 996) = 0.62 Prob > F = 0.4318 (6) Coef. (standard error) Std. Err. (standard deviation) x 1 95% CI (degrees of freedom) 996 t invttail(n, p) [D] functions. display invttail(996, 0.025) * 3 1.9623486 95% (critical value). display 0.476147-1.9623486*0.0303301.41662877. display 0.476147 + 1.9623486*0.0303301.53566523 x 1 95% CI [0.4166288, 0.5356652] *3 Data Other utilities Hand calculator 25

x 1, x 2 1.3 1.2 1,000 1/10 resampling seed seed. set seed 111 * 4 100. sample 100, count * 5 (900 observations deleted) 100 regress. regress y x1 x2 x3. regress y x1 x2 x3 Source SS df MS Number of obs = 100 F( 3, 96) = 15.43 Model 5157.47143 3 1719.15714 Prob > F = 0.0000 Residual 10693.3387 96 111.388945 R squared = 0.3254 Adj R squared = 0.3043 Total 15850.8101 99 160.109193 Root MSE = 10.554 y Coef. Std. Err. t P> t [95% Conf. Interval] x1.583658.1117937 5.22 0.000.3617492.8055668 x2 1.565079.3726188 4.20 0.000.8254362 2.304721 x3.1063396.3448411 0.31 0.758.7908438.5781646 _cons 12.28143 6.422559 1.91 0.059 25.03011.4672456 R 2 x 3 1.2 3 4 x 1 0.37 0.8 5% test *4 [R] set seed *5 Statistics Resampling Draw random sample 26

. test (x1 = 0.37). test (x1 = 0.37) ( 1) x1 =.37 F( 1, 96) = 3.65 Prob > F = 0.0590. test (x1 = 0.8). test (x1 = 0.8) ( 1) x1 =.8 F( 1, 96) = 3.74 Prob > F = 0.0559 p 0.05 (significance level) 5% Reporting Confidence level 3 2. Stata Example auto.dta. sysuse auto.dta * 6 (1978 Automobile Data) 1978 74 mpg (miles per gallon) weight 2 weight 2 2 c.weight#c.weight mwp-028. generate weightsq = weight^2 * 7 *6 File Example datasets Example datasets installed with Stata *7 Data Create or change data Create new variable 27

. format weightsq %10.0g * 8. list mpg weight weightsq in 1/5 * 9 mpg weight weightsq 1. 22 2,930 8584900 2. 17 3,350 11222500 3. 22 2,640 6969600 4. 20 3,250 10562500 5. 15 4,080 16646400 weight K 2 2. regress mpg weight weightsq * 10. regress mpg weight weightsq Source SS df MS Number of obs = 74 F( 2, 71) = 72.80 Model 1642.52197 2 821.260986 Prob > F = 0.0000 Residual 800.937487 71 11.2808097 R squared = 0.6722 Adj R squared = 0.6630 Total 2443.45946 73 33.4720474 Root MSE = 3.3587 mpg Coef. Std. Err. t P> t [95% Conf. Interval] weight.0141581.0038835 3.65 0.001.0219016.0064145 weightsq 1.32e 06 6.26e 07 2.12 0.038 7.67e 08 2.57e 06 _cons 51.18308 5.767884 8.87 0.000 39.68225 62.68392 mpg = 1.42e-2 weight + 1.32e-6 weight 2 + 51.18 mpg weight weight 2 10 4 weight 2 10 6 10 7 (normalize) 0 1 beta *8 Variables weightsq Format weightsq *9 Data Describe data List data *10 Statistics Linear models and related Linear regression 28

regress : Reporting : Standardized beta coefficients: 3 regress - Reporting. regress mpg weight weightsq, beta Source SS df MS Number of obs = 74 F( 2, 71) = 72.80 Model 1642.52197 2 821.260986 Prob > F = 0.0000 Residual 800.937487 71 11.2808097 R squared = 0.6722 Adj R squared = 0.6630 Total 2443.45946 73 33.4720474 Root MSE = 3.3587 mpg Coef. Std. Err. t P> t Beta weight.0141581.0038835 3.65 0.001 1.901918 weightsq 1.32e 06 6.26e 07 2.12 0.038 1.104148 _cons 51.18308 5.767884 8.87 0.000. beta weight weight 2 1.7 29

3. auto.dta weight length β 0 length = 0 β 0 β 0 = 0 noconstant Statistics Linear models and related Linear regression Model : Dependent variable: weight Independent variables: length Suppress constant term:. regress weight length, noconstant Source SS df MS Number of obs = 74 F( 1, 73) = 3450.13 Model 703869302 1 703869302 Prob > F = 0.0000 Residual 14892897.8 73 204012.299 R squared = 0.9793 Adj R squared = 0.9790 Total 718762200 74 9713002.7 Root MSE = 451.68 weight Coef. Std. Err. t P> t [95% Conf. Interval] length 16.29829.2774752 58.74 0.000 15.74528 16.8513 1 16.3 4. Example census9.dta. use http://www.stata-press.com/data/r11/census9, clear * 11 (1980 Census data by state) 1980 5 *11 File Example Datasets Stata 11 manual datasets Base Reference Manual [R] regress 30

. list state drate pop medage region in 1/5, nolabel * 12 state drate pop medage region 1. Alabama 91 3,893,888 29.30 3 2. Alaska 40 401,851 26.10 4 3. Arizona 78 2,718,215 29.20 4 4. Arkansas 99 2,286,435 30.60 3 5. California 79 23,667,902 29.90 4 (drate) (medage) (region) region Northeast, North Central, South, West 1, 2, 3, 4 drate medage pop mwp-027 4 regress - Weights Analytic weights Frequency weights Alabama 3, 893, 888 29.3 *12 region 31

Statistics Linear models and related Linear regression Model : Dependent variable: drate Independent variables: medage i.region * 13 Weights : Analytic weights: pop. regress drate medage i.region [aweight = pop] (sum of wgt is 2.2591e+08) Source SS df MS Number of obs = 50 F( 4, 45) = 37.21 Model 4096.6093 4 1024.15232 Prob > F = 0.0000 Residual 1238.40987 45 27.5202192 R squared = 0.7679 Adj R squared = 0.7472 Total 5335.01916 49 108.877942 Root MSE = 5.246 drate Coef. Std. Err. t P> t [95% Conf. Interval] medage 4.283183.5393329 7.94 0.000 3.196911 5.369455 region 2.3138738 2.456431 0.13 0.899 4.633632 5.26138 3 1.438452 2.320244 0.62 0.538 6.111663 3.234758 4 10.90629 2.681349 4.07 0.000 16.30681 5.505777 _cons 39.14727 17.23613 2.27 0.028 73.86262 4.431915 [aweight = pop] 5. regress 2 (OLS: ordinary least squares) OLS (homoskedasticity) Example auto.dta. sysuse auto, clear (1978 Automobile Data). replace weight = weight/1000 * 14 *13 i. mwp-028 *14 Data Create or change data Change contents of variable K 32

. regress mpg weight. regress mpg weight Source SS df MS Number of obs = 74 F( 1, 72) = 134.62 Model 1591.99024 1 1591.99024 Prob > F = 0.0000 Residual 851.469221 72 11.8259614 R squared = 0.6515 Adj R squared = 0.6467 Total 2443.45946 73 33.4720474 Root MSE = 3.4389 mpg Coef. Std. Err. t P> t [95% Conf. Interval] weight 6.008687.5178782 11.60 0.000 7.041058 4.976316 _cons 39.44028 1.614003 24.44 0.000 36.22283 42.65774. twoway (scatter mpg weight) (lfit mpg weight), ytitle(mpg) * 15 1 rvpplot (residual-versus-predictor plot). rvpplot weight, yline(0) * 16 *15 Graphics Twoway graph (scatter, line, etc.) *16 Statistics Linear models and related Regression diagnostics Residual-versus-predictor plot 33

rvpplot weight (heteroskedasticity) Statistics Postestimation Reports and statistics estat : Reports and statistics: Tests for heteroskedasticity (hettest) 5 estat 34

. estat hettest Breusch Pagan / Cook Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of mpg chi2(1) = 11.05 Prob > chi2 = 0.0009 estat hettest H 0 p 0.0009 H 0 OLS OLS regress SE/Robust SE (standard error) Statistics Linear models and related Linear regression Model : Dependent variable: mpg Independent variables: weight SE/Robust : Robust 6 regress - SE/Robust 35

. regress mpg weight, vce(robust) Linear regression Number of obs = 74 F( 1, 72) = 105.83 Prob > F = 0.0000 R squared = 0.6515 Root MSE = 3.4389 Robust mpg Coef. Std. Err. t P> t [95% Conf. Interval] weight 6.008687.5840839 10.29 0.000 7.173037 4.844337 _cons 39.44028 1.98832 19.84 0.000 35.47664 43.40393 vce(robust) OLS 95% CI OLS Robust weight [ 7.04, 4.98] [ 7.17, 4.84] cons [36.22, 42.66] [35.48, 43.40] SE/Robust Clustered robust 36

5 Graphics Twoway graph (scatter, line, etc.) Plots Create Plot 1 Choose a plot category and type: Basic plots Basic plots: Scatter Y variable: mpg X variable: weight Plots Create Plot 2 Choose a plot category and type: Fit plots Fit plots: Linear prediction Y variable: mpg X variable: weight Y axis : Title: mpg. twoway (scatter mpg weight) (lfit mpg weight), ytitle(mpg) 37

mwp-042 anova/oneway - anova oneway ANOVA 1. 2. 3. ANOVA oneway 4. ANOVA anova 5. 6. ANOVA 7. ANOVA 8. ANOVA 1. 2 t mwp-041 3 A, B, C t α 5% (1) A-B = 0.95 (2) A-C = 0.95 (3) B-C = 0.95 3 0.95 3 = 0.86 (1), (2), (3) 1 0.86 = 0.14 1 5% c Copyright Math c Copyright StataCorp LP (used with permission) 38

3 (ANOVA: analysis of variance) F (multiple comparison) 2. (1) (2) (3) (4) (repeated-measures) ANOVA 3. ANOVA oneway (factor) 1 ANOVA (one-way ANOVA) ANOVA anova, oneway oneway anova1.dta. use http://www.math-koubou.jp/stata/data11/anova1.dta, clear 24 (blood pressure). list if n <= 3 n >= 22, separator(3) * 1 bp drug 1. 126 1 2. 121 1 3. 115 1 22. 137 4 23. 139 4 24. 123 4 drug 1, 2, 3, 4 4 drug *2 *1 Data Describe data List data *2 Stata 39

drug bp 1 126 121 115 123 125 113 120.5 2 112 123 115 129 106 108 115.5 3 123 112 133 124 130 121 123.8 4 122 132 125 137 139 123 129.7 oneway α 5% Statistics Linear models and related ANOVA/MANOVA One-way ANOVA Main : Response variable: bp Factor variable: drug Multiple-comparison tests: Bonferroni Output: Produce summary table: 1 oneway - Main 40

. oneway bp drug, bonferroni tabulate Summary of bp drug Mean Std. Dev. Freq. 1 120.5 5.3572381 6 2 115.5 8.9162773 6 3 123.83333 7.3598007 6 4 129.66667 7.3665913 6 Total 122.375 8.6467511 24 Analysis of Variance Source SS df MS F Prob > F Between groups 636.458333 3 212.152778 3.92 0.0237 Within groups 1083.16667 20 54.1583333 Total 1719.625 23 74.7663043 Bartlett's test for equal variances: chi2(3) = 1.1493 Prob>chi2 = 0.765 Comparison of bp by drug (Bonferroni) Row Mean Col Mean 1 2 3 2 5 1.000 3 3.33333 8.33333 1.000 0.383 4 9.16667 14.1667 5.83333 0.260 0.020 1.000 (1) ANOVA tabulate (frequency) 6 oneway anova (unbalanced data) ANOVA Analysis of Variance Source SS df MS F Prob > F Between groups 636.458333 3 212.152778 3.92 0.0237 Within groups 1083.16667 20 54.1583333 Total 1719.625 23 74.7663043 Bartlett's test for equal variances: chi2(3) = 1.1493 Prob>chi2 = 0.765 41

SS (sum of squares) regress mwp-037 y (yi ȳ) 2 = (y i ŷ i ) 2 + (ŷ i ȳ) 2 (yi ȳ) 2 TSS (total sum of squares) (ŷi ȳ) 2 MSS (model sum of squares) (yi ŷ i ) 2 RSS (residual sum of squares) MSS (between groups) 636.46 RSS (within groups) 1083.17 TSS (total) 1719.63 (df: degrees of freedom) MS (mean square) 212.15, 54.16 F 212.15/54.16 = 3.92 F F p 0.0237 p < 0.05 ANOVA Bartlett ANOVA p 0.05 (2) ANOVA µ 1 = µ 2 = µ 3 = µ 4 bonferroni Bonferroni ANOVA Comparison of bp by drug (Bonferroni) Row Mean Col Mean 1 2 3 2 5 1.000 3 3.33333 8.33333 1.000 0.383 4 9.16667 14.1667 5.83333 0.260 0.020 1.000 M ij M ij µ i µ j 0 Bonferroni p µ 4 µ 2 Bonferroni Scheffe, Šidák 42

4. ANOVA anova 5. 6. ANOVA 7. ANOVA 8. ANOVA ANOVA 43

mwp-070 table - table Superrows, supercolumns 1. table 2. 3. 4. 1. table table n n (n-way table) n 7 (frequency) (mean) (standard deviation) (maximum) (minimum) (median) (interquantile range) (percentile) c Copyright Math c Copyright StataCorp LP (used with permission) 44

2. Example auto.dta. sysuse auto.dta * 1 (1978 Automobile Data) 1978 74 displacement. generate disp1 = 16.4*displacement/1000 * 2 disp1. summarize disp1 * 3. summarize disp1 Variable Obs Mean Std. Dev. Min Max disp1 74 3.235676 1.50613 1.2956 6.97 1.3l 7.0l disp1 class class 0 disp1 2.0 1 2.0 < disp1 3.0 2 3.0 < disp1 4.0 3 4.0 < disp1. generate class = irecode(disp1, 2.0, 3.0, 4.0) * 4 5 disp1 class. list make disp1 class in 1/5 make disp1 class 1. AMC Concord 1.9844 0 2. AMC Pacer 4.2312 3 3. AMC Spirit 1.9844 0 4. Buick Century 3.2144 2 5. Buick Electra 5.74 3 *1 File Example datasets Example datasets installed with Stata *2 Data Create or change data Create new variable *3 Statistics Summaries, tables and tests Summary and descriptive statistics Summary statistics *4 irecode() [D] functions 45

class mpg (miles per gallon) Statistics Summaries, tables, and tests Tables Table of summary statistics (table) Main : Row variable: class Statistics 1: Count nonmissing mpg Statistics 2: Mean mpg Statistics 3: Standard deviation mpg 1 table - Main. table class, contents(count mpg mean mpg sd mpg ) class N(mpg) mean(mpg) sd(mpg) 0 23 27.2609 5.100957 1 13 20.6154 3.819652 2 19 19.4211 2.734873 3 19 16.4211 3.48514 mpg class *5 *5 Variables Manager 46

. label define class 0 "<2000cc" 1 "2000-3000cc" 2 "3000-4000cc" 3 ">4000cc". label values class class. table class, contents(count mpg mean mpg sd mpg ). table class, contents(count mpg mean mpg sd mpg ) class N(mpg) mean(mpg) sd(mpg) <2000cc 23 27.2609 5.100957 2000 3000cc 13 20.6154 3.819652 3000 4000cc 19 19.4211 2.734873 >4000cc 19 16.4211 3.48514 table Options Options : Override display format for numbers in cells: %9.2f 2 table - Options. table class, contents(count mpg mean mpg sd mpg ) format(%9.2f) class N(mpg) mean(mpg) sd(mpg) <2000cc 23 27.26 5.10 2000 3000cc 13 20.62 3.82 3000 4000cc 19 19.42 2.73 >4000cc 19 16.42 3.49 47

3. 4. 48