Stata 1 Stata Stata web site 1 Stata is used by medical researchers, biostatisticians, epidemiologists, economists, sociologists, political scientists, geographers, psychologists, social scientists, and other research professionals needing to analyze data. Stata canned package Stata Stata TSP Rats SHAZAM E-views LIMDEP SPSS SAS Ox C Matlab GAUSS Ox C Matlab GAUSS TSP Rats E-views Stata Stata Stata Stata 2000 2 Stata ado web ado Stata ado Stata corp. Stata 1 http://www.stata.com/ 2 Stata 7 special edition 1
2 PC 2.1 PC 3 Stata ado Help Official Update Recommendation : compare these dates with what is available from http://www.stata.com/ update query currently installed last available 4 Recommendation wstata.exe ado ado Stata Corp. Stata ado ado Box-Cox ado ado Help STB and User-written programs ado ado ado help ado ado 3 4 02 5 Stata 6 2
2.2 Stata Stata Results Review Variables Stata Command Fonts Prefs Save Windowing preferences 3 Stata 3.1 Stata Results 3.2 Stata Command Stata Stata 3.3 Review Stata Command Stata Command 3
3.4 Variables Stata Command 4 5 4.1 Stata 3MB set mem 30m 30MB 4.2 Stata Results Open Log Append Overwrite Stata Results Stata Close log file 5 Stata Stata 4
6 4.3 set more off Stata Results -- more -- do 5 Stata Stata Stata Stata.dta 5.1 Stata Name Label Name 7 _all _b byte _coef _cons double float if in int long _n _N _pi _pred _rc _se _skip using with 6 7 Stata 7 5
Label Variables Name Label Label. 5.2 Data editor 5.3 Excel Excel Name Excel Data editor Edit Paste 5.4 insheet 5.5.dct infix using "C:\My Documents\sick.dct" infix dictionary using "C:\My Documents\sick.dat" { hhold 1-5... } sick.dat hhold 6
5.6 Label Name Label Label Data Editor Stata Variable Infomation Label Do label var hhold "Household ID" hhold Household ID 5.7 Stata Stata Name Label Stata Stata Stata Stata 3MB Stata set mem File Open 8 6 6.1 Help Stata Help Help Search 191 search 191 8 Do 7
6.2 Data Browser Data Editor Data Browser Data Browser Data Editor Data Browser Data Browser 6.3 Do Do Stata Do Do Stata Do file editor Do file editor Do Do Do Do Do Do /* */ Stata /* */ set mem 30m set more off /* Converting sick report to Stata format */ quietly infix using "C:\My Documents\2001\Ii\Intage\sicka.dct" save "C:\My Documents\2001\Ii\Intage\sicka.dta", replace /* generating data about 1st time to go to a doctor */ generate godoc1st = 1 if cope1n1 == 1 quietly replace godoc1st = d1d + 1 /* */ if cope1n1 ==. & cope2n1 == 1 8
7 9 7.1 describe codebook list inspect sum sum hosd1-hosd30 hosd1, hosd2,..., hosd30 centile table table tabulate correlate 7.2 drop table duration godoc1st if age >= 40 & duration < 11 age 40 duration 11 duration, godoc1st if == >= <= > < ~= & + - / * ^. 9 Help graph 9
if sum duration if sickname == 1 sum duration if sickname == 2... by gsort sickname by sickname : sum duration sickname by by 10 sum duration if age <= 9 & age >=0 sum duration if age <= 19 & age >=10... for for num 0/9 : sum duration if age <= X9 & age >=X0 X X 10 X, Y, Z by for for num 2/29 \ num 3/30 : gsort hosdcx \ by hosdcx : sum hosdy 10 10
gsort hosdc2 by hosdc2 : sum hosd3 gsort hosdc3 by hosdc3 : sum hosd4... gsort hosdc29 by hosdc29 : sum hosd30 by for 7.3 sum generate hosdc2 = hosd1 + hosd2 generate log generate egen egen max, min, sum, median [ n 1] 11 1 sort id time by id: generate dhos = hos[_n] + hos[_n-1] hos 1 id dhos sort id time by id: generate dhos = hos[_n+1] + hos[_n] 11 ts 11
replace generate avedoc = 0 if godoc1st ==. for num 2/30 : quietly replace avedoc = hosdcx / duration if duration == X quietly replace avedoc = 1 if duration == 1 & godoc1st == 1 avedoc godoc1st duration X X hosdcx duration generate inc700d = 1 if income >= 700 & income ~=. replace inc700d = 0 if income < 700 pref generate replace tabulate pref, generate(pref) pref1, pref2,... label pref == 1 7.4 Stata keep drop 12
keep if drop if keep drop 2 append merge using Stata append missing variable append using newdata using " merge merge recid using ds2 recid ds2 merge merge merge drop merge keep save use keep save use merge merge rename var 13
8 ttest, by(variables) Help search Reference Manual Reference Manual 8.1 OLS duration i = β 0 + β 1 gender + β 2 inc700d + ε i Gauss-Markov β 0,β 1,β 2 BLUE 1 1. y i = β 1 x i1 + β 2 x i2 + + β K x ik + ε i 2. E[ε i X] =E[ε i x 1,x 2,..., x n ]=0 3. n K K 4. E[ε 2 i x 1,x 2,..., x n ] = σ 2 > 0 (i =1, 2,..., n) E[ε i ε j x 1,x 2,..., x n ] = 0 (i j) OLS b =(X X) 1 X y β 1. unbiasedness 12 E[b X] =β 2. Var(b X) =σ 2 (X X) 1 12 14
3. BLUE : Best Linear Unbiased Estimator b efficient 4. cov(b, e X) =cov(b, (y Xb) X) =0 5. E[e e/(n K) X] =σ 2 OLS Stata OLS reg reg duration gender inc700d duration reg nocons reg duration gender inc700d, nocons White reg duration gender inc700d, robust Davidson and MacKinnon (1993, pp.554-556) 13 hc2 hc3 1 Cochrane-Orcutt prais corc 2 8.1.1 Gauss-Markov k 1 14 t Y 1 = β 1 + β 2 X 21 + β 3 X 31 + + β k X k1 + u 1 Y 2 = β 2 + β 2 X 22 + β 3 X 32 + + β k X k2 + u 2. (1). Y t = β 1 + β 2 X 2t + β 3 X 3t + + β k X kt + u t 13 Davidson, Russell and James G. MacKinnon. 1993. Estimation and Inference in Econometrics. Oxford University Press. 14 15
k Y 1 1 X 21 X 31 X k1 β 1 u 1 Y 2 y =., X = 1 X 22 X 32 X k2.......,β = β 2., u = u 2 (2). Y t 1 X 2t X 3t X kt β k u t y = Xβ + u (3) y t 1 X t k β k 1 u t 1 Xβ 15 t 1 1 Y 1 = β 1 + β 2 X 21 + β 3 X 31 + + β k X k1 + u 1 (4) 3 X u β u y 3 u β 2 u 2 β 3 2 u u = y Xβ (5) 2 [ u 2 1 + u2 2 + + u2 t = u u = ] u 1 u 2 u t u 1 u 2.. (6) u t 16 u u =(y Xβ) (y Xβ) (7) 15 3 16 u u T 16
17 u u =(y (Xβ) )(y Xβ) (8) 18 u u =(y β X )(y Xβ) (9) 19 u u = y y y Xβ β X y β X Xβ (10) 3 β X y u u = y y y Xβ y Xβ β X Xβ (11) 2 3 u u = y y 2y Xβ β X Xβ (12) β b e = y Xb e e β 2 b b b 20 b b f(x) (x 1,x 2,,x k ) 21 x =(x 1,x 2,,x k ) f x 1 f f x = x 2. f x k f(x) (x 1,x 2,,x k ) x 1 [ ] x 2 f(x) =a x = a 1 a 2 a k. x k 17 18 19 (a + b)(c + d) =ac + ad + bc + bd 20 21 (13) (14) 17
a 1 f x = a 2. = a (15) a k (a x) x = a (16) 2 A 22 (x Ax) x =2Ax (17) u u = y y 2y Xβ β X Xβ (18) β 2 b 23 (u u) b = 2y X +2X Xb = 0 (19) 2 X Xb = X y (20) b X Xb = X y (21) X X 1 X OLS X X OLS b =(X X) 1 X y (22) 22 X X X 23 18
OLS β b β =(X X) 1 X y β (23) y Xβ + u b β =(X X) 1 X (Xβ + u) β (24) b β =(X X) 1 (X X)β +(X X) 1 X u β (25) X 1 X = I b β = β +(X X) 1 X u β (26) b β =(X X) 1 X u (27) sampling error E[b β] =E[(X X) 1 X u] (28) 24 E[b] β =(X X) 1 X E[u] (29) u E[b] =β (30) OLS OLS b 25 var(b) = var(b β) (31) sampling error var(b) = var((x X) 1 X u) (32) A x var(ax) =Avar(x)A (33) 24 OLS b β 25 k k k 19
26 var(b) =(X X) 1 X var(u)((x X) 1 X ) (34) u var(u) =E(uu ) (35) var(b) =(X X) 1 X E(uu )((X X) 1 X ) (36) σ 2 var(b) =(X X) 1 X (σ 2 I n )((X X) 1 X ) (37) σ 2 var(b) =σ 2 (X X) 1 X ((X X) 1 X ) (38) 27 var(b) =σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1 (39) 2 OLS 3 OLS σ 2 (X X) 1 2 OLS Gauss-Markov OLS e OLS b 8.2 IV GMM: Generalized Method of Moments Hayashi 2000, pp.226-231 28 y i = δ 1 z 1i + δ 2 z 2i + + δ L z Li + ε i 26 2 27 (AB) 1 = B 1 A 1 ABB 1 A 1 = I 28 Hayashi, Fumio. 2000. Econometrics. Princeton University Press. 20
δ 1,δ 2,,δ L L z 1i,z 2i,,z Li ε i K x 1i,x 2i,,x Ki IV: Instrumental Variables x 1i 0 x 2i E. ε i = 0. x Ki 0 ε i ε i = y i δ 1 z 1i δ 2 z 2i δ L z Li x 1i 0 x 2i E. [y i δ 1 z 1i δ 2 z 2i δ L z Li ] = 0. x Ki 0 x 1i (y i δ 1 z 1i δ 2 z 2i δ L z Li ) x 2i (y i δ 1 z 1i δ 2 z 2i δ L z Li ) E.. x Ki (y i δ 1 z 1i δ 2 z 2i δ L z Li ) = 0 0.. 0 orthogonality condition x ki,z li,y i (δ 1,δ 2,,δ L ) L K = = K<L Not identified Just identified 1 Over identified Over identified GMM Hansen J Hansen s J-statistic Hayashi 2000, pp.217-218 specification test 1 GMM order condition IV OLS GMM OLS 21
29 excluded variables excluded restrictions OLS rank condition conditional homoskedasticity efficient GMM 2 2SLS Hayashi 2000. pp.226-231 2SLS OLS 1 ẑ li predicted values, fitted values 2 2 y i ẑ iˆδ yi z iˆδ 2SLS GMM 1 GMM 2SLS Stata ivreg ivreg ( = ) White OLS robust nocons 1 first 2SLS consistency finite-sample weak instruments ; Hayashi 2000, p.229, Staiger and Stock 1997 30 1 F R 2 ado GMM ivgmm0 ( = ) 8.3 Probit Stata 29 Wooldridge 2001, pp.49-51. simultaneity omitted variables measurement error 30 Staiger, D. and J. Stock. 1997. Instrumental variables regression with weak instruments. Econometrica 65, 557-586. 22
Yes No latent variable x i y i yi yi = β 0 + β 1 x i + ε i β 0,β 1 ε i N(0,σ 2 ) y i =1 y i =0 y > 0 y 0 P (y i =1) = Pr(y > 0) = Pr(β 0 + β 1 x i + ε i > 0) = Pr(ε i > (β 0 + β 1 x i )) (40) ε i N(0,σ 2 ) Φ(z) ( ) β0 + β 1 x i P (y i =1)=Pr(ε i <β 0 + β 1 x i )=Φ (41) σ P (y i =0) = Pr(y 0) = Pr(β 0 + β 1 x i + ε i 0) = Pr(ε i (β 0 + β 1 x i )) (42) ε i N(0,σ 2 ) P (y i =0) = Pr(ε i (β 0 + β 1 x i )) ( = Φ β ) ( ) 0 + β 1 x i β0 + β 1 x i =1 Φ σ σ (43) 23
L n [ ( )] y β0 + β 1 x [ ( )] i 1 y i β0 i + β 1 x i L = Φ 1 Φ σ σ i=1 n ( )] [ ( )] ln L = yi [Φ ln β0 + β 1 x i +(1 y i σ )ln β0 + β 1 x i 1 Φ σ i=1 b 0,b 1 σ σ =1 b 0,b 1 P (y i =1)=Φ(b 0 + b 1 x i ) b 0,b 1 1 P (y i =1) x x P (y i =1)=φ (b 0 + b 1 x) b 1 P (y i =1) 70 5 1 P (y i =1) 70 75 2 2 3 3 500 0 500 1000 1 1000 2 24
probit logit probit dprobit oprobit depvar variables 8.4 Heckman 2 Heckman 2 y i x i u i y i = β x i + u i y i π z i + ε i 0 z i π ε i ε i σε 2 u i σ uε ε i u i 2 ( ) [( ) ( )] ui 0 σ 2 = N, u σ uε. 0 ε i σ uε σ 2 ε y i Gauss-Markov E[y i x i,z i,π z i + ε i 0] = β x i + E[u i ε i π z i ] η i π z i + ε i 0 y i = β x i + E[u i ε i π z i ]+η i. E[u i ε i π z i ] ε i u i Choleski decomposition Greene 1997, p.44, p.179 ( ) ( )( ) σε 2 σ uε σ ε 0 σ ε σ uε /σ ε = σ uε σ uε /σ ε σ u 0 σ u σ 2 u 25
N(0, 1) x 1,x 2 ε i u i ( ) ( ) ( )( ε i 0 σ ε 0 = + u i 0 σ uε /σ ε σ u x 1 x 2 [ ] E[u i ε i π σ uε z i ] = E x 1 + σ u x 2 σ ε σ εx 1 π z i [ ] = σ uε σ ε E x 1 x 1 π z i σ ε ) ( ) σ ε x 1 = (σ uε /σ ε )x 1 + σ u x 2 φ( ) Φ( ) φ (m) = mφ(m), φ(m) =φ( m) E[u i ε i < π z i ] = σ uε σ ε π z i σε = σ uε σ ε 1 Φ(π z i /σ ε ) = σ uε σ ε 1 Φ(π z i /σ ε ) = σ uε σ ε 1 Φ(π z i /σ ε ) φ = λ φ (π z i /σ ε ) Φ(π z i /σ ε ) x 1 φ(x 1 ) 1 Φ( π z i /σ ε ) dx 1 π z i σε π z i σε ( π ) z i σ ε x 1 φ(x 1 ) dx 1 φ (x 1 ) dx 1 λ = σ uε /σ ε y i = β x i + λ φ (π z i /σ ε ) Φ(π z i /σ ε ) + η i η i well-behaved Heckman Heckit Probit π π φ(π z i /σ ε )/Φ(π z i /σ ε ) Inverse Mill s Ratio OLS Type 2 Tobit Type 5 Tobit Amemiya 1995 y i = β x i + u i β = β 1 if P i = π z i + ε i < 0 β = β 2 if P i = π z i + ε i 0 26
P i β switching regression P i self-selection model P i Heckman 2 heckman, select( ) x i z i heckman Heckman 2 twostep select( ) 8.5 31 random effect model xtdes, i( ) t( ) xtsum, i( ) t( ) xtreg depvar variables, fe i( ) xtreg depvar variables, re i( ) xthaus Hausman xtivreg 32 xtivreg strict exogeneity 33 xtivreg GLS random-effect model Between-effects 31 Stata 8 Reference Cross-Sectional Time-Series 32 Stata 8 33 Baltagi, Badi H. 2001. Econometric Analysis of Panel Data 2nd edition. John Wiley and Sons. Wooldridge, Jeffrey M. 2001. Econometric Analysis of Cross Section and Panel Data. MIT Press. 10 11 27
model Fixed-effects model First-differenced estimator 4 xtivreg ( = ), re be fe fd Firstdifferenced estimator xtreg i( ) First-differenced estimator tsset first 1 GLS random-effect model Baltagi EC2SLS Arellano and Bond (1991, RES) xtabond Amemiya and McCurdy (1986, Econometrica) Hausman and Taylor (1981, Econometrica) xthtaylor xtlogit xtprobit Stata 8 tsset 8.6 durac faildoc Stata stset durac, failure( faildoc) Cox stcox gender hcon2-hcon6 inc700d copay2-copay7 streg gender hcon2-hcon6 inc700d copay2-copay7, d(weibull) Kaplan-Meier sts graph sts graph, na 28
8.7 (1) (2) (3) (4) Density Estimation i.i.d., identically and independently distributed X f(x) f(x) smooth h>0 2h f(x) = x+h x h f(u) du = Pr(X [x h, x + h]) x f(x) 34 ˆf h (X) ˆf h (x) = 1 Pr(X [x h, x + h]) 2h Pr(X [x h, x+h]) X [x h, x+h] 2h [x h, x + h] N Indicator function I ˆf h (x) = 1 1 2h N = 1 N N I( X i x h) i=1 N 1 1 h 2 I( X i x /h 1) = 1 N i=1 N i=1 1 h K((x X i)/h) = 1 N N K h (x X i ) kernel K(u) = 1 2I( u 1) bindwidth h f(x) kernel density estimator kernel step function kernel [x h, x + h] x kernel window 34 i=1 29
function K kernel Gaussian kernel K(u) =(2π) 1/2 exp( u 2 /2) kernel piecewise continuous kernel K(u) K(u) =K( u), K(u) du =1 kdensity, kernel Kernel Gaussian kernel triangle kernel Kernel generate(, ) Regression Estimation 1 2 m(x) =E(Y X = x) (X, Y ) f(x, y) yf(x, y) dy m(x) =E(Y X = x) = f(x, y) dy kernel density estimator ˆf h (x, y) = 1 N K h (x X i )K h (y Y i ) N i=1 y kernel 1 N ˆf h (x, y) dy = K h (x X i )K h (y Y i ) dy N i=1 = 1 N K h (x X i ) K h (y Y i ) dy = 1 N N y ˆf h (x, y) dy = = 1 N i=1 1 N i=1 N Y i K h (x X i )K h (y Y i ) dy i=1 N K h (x X i )Y i K h (y Y i ) dy = 1 N 30 N K h (x X i ) i=1 N K h (x X i )Y i i=1
y 1 N ˆfh (x, y) dy ˆm h (x) = ˆfh (x, y) dy = N i=1 K h(x X i )Y i 1 N N i=1 K h(x X i ) Nadaraya-Watson kernel estimate Nadaraya-Watson kernel estimate kernreg Y X, bwidth( ) kercode( ) npoint( ) bwidth smoothing parameter kernel kercode kernel 1 7 npoint 9 t 9.1 Stata b[ ] b[ cons] generate, replace predicted value, fitted value predict, xb predict post estimation command stdp xb e(sample) 31
predict if e(sample), xb 9.2 F χ 2 Stata b[ ] hours i = β 0 + β 1 lnw i + β 2 asset i + ε i β 1 = β 2 Stata test test OLS F reg hours lnw asset test _b[lnw] = _b[asset] test _b[lnw] + _b[asset] = 1 display reg hours lnw asset display _b[lnw] + _b[asset] test _b[lnw] + _b[asset] = 1 accumulate β 1 =0.5 test _b[lnw] + _b[asset] = 1 test _b[lnw] = 0.5, accumulate 2 accumulate 32
9.3 test Non-Linear testnl test test testnl 1 1 + exp(β 0 /β 1 ) =0.3 display 1/[1+exp(_b[_cons]/_b[lnw])] testnl 1/[1+exp(_b[_cons]/_b[lnw])] =0.3 Wald 2 ( ) 10 2 10.1 2 2 2 program define end program define nl version 6.0 if " 1 " == "?" { global S_1 " " exit } replace 1 = end nl nl 33
( 1 α hours i = ɛ ln α w i ) + u i ɛ α w i hours i nlsbst program define nlsbst version 6.0 if " 1 " == "?" { global S_1 "alpha epsilon" global alpha = 0.001 global epsilon = 0.4 exit } replace 1 = -$epsilon * ln((1-$alpha)*w/$alpha) end nl sbst hours 2 replace $ 2 b[w] 10.2 Stata Stata random effect Stata lf, d0, d1, d2 4 lf lf 34
2 lf program define args lnf theta1 theta2... tempvar quietly replace lnf =... end ml model lf ( = )... ml check ml maximize lnf program quietly replace lnf = theta1, theta2,... theta $ML y1, $ML y2,... = Stata Gould and Sribney 1999, 27 35 Weibull ln f i =(t i exp(x i β)) exp(s) + d i (s x i β + (exp(s) 1)(ln t i x i β)) β Weibull s t i,d i x i β theta1 s theta2 t i d i Stata s exp(s) 35 Gould, William and William Sribney. 1999. Maximum likelihood estimation with STATA. Stata Press, Texas. 35
p 1 M ln t i x i β R Stata program define myweib version 6.0 args lnf theta1 theta2 tempvar p M R quietly generate double p = exp( theta2 ) quietly generate double M = ($ML_y1*exp(- theta1 ))^ p quietly generate double R = ln($ml_y1)- theta1 quietly replace lnf = - M +$ML_y2*( theta2 - theta1 +( p -1)* R ) end generate, replace quietly double t i time d i died x i age ml model lf myweib (time died = age) () 2 theta2 ml model lf myweib (timeeq: time died = age) (sigma: ) $ML y1, $ML y2,..., theta1, theta2,..., = ml model lf myweib (timeeq: time died = age, nocons) (sigma: ) ml check ml maximize 36
11 Stata Reference Manual 37