mwp-037 regress - regress 1. 1.1 1.2 1.3 2. 3. 4. 5. 1. regress. regress mpg weight foreign Source SS df MS Number of obs = 74 F( 2, 71) = 69.75 Model 1619.2877 2 809.643849 Prob > F = 0.0000 Residual 824.171761 71 11.608053 R squared = 0.6627 Adj R squared = 0.6532 Total 2443.45946 73 33.4720474 Root MSE = 3.4071 mpg Coef. Std. Err. t P> t [95% Conf. Interval] weight.0065879.0006371 10.34 0.000.0078583.0053175 foreign 1.650029 1.075994 1.53 0.130 3.7955.4954422 _cons 41.6797 2.165547 19.25 0.000 37.36172 45.99768 c Copyright Math c Copyright StataCorp LP (used with permission) 1
Stata Example auto.dta mpg = β 0 + β 1 weight + β 2 foreign + ϵ Stata regress 1.1 regress1.dta 1,000 use http://www.math-koubou.jp/stata/data11/regress1.dta 5. list in 1/5 * 1 y x1 x2 x3 1. 25.4 49 3 3 2. 35.8 54 7 10 3. 36.2 73 9 6 4. 11.6 26 2 3 5. 16.8 45 8 3 y. generate y = 0.5*x1 + 2*x2-10 + rnormal(0, 10) x 1 x 2 rnormal(m, s) m s [D] functions y = 0.5x 1 + 2.0x 2 10 (1) x 1, x 2 x 3 y x 3 *1 Data Describe data List data 2
1.2 1,000 regress Statistics Linear models and related Linear regression Model : Dependent variable: y Independent variables: x1 x2 x3 1 regress - Model. regress y x1 x2 x3 Source SS df MS Number of obs = 1000 F( 3, 996) = 215.47 Model 63294.2507 3 21098.0836 Prob > F = 0.0000 Residual 97526.2879 996 97.9179598 R squared = 0.3936 Adj R squared = 0.3917 Total 160820.539 999 160.98152 Root MSE = 9.8954 y Coef. Std. Err. t P> t [95% Conf. Interval] x1.476147.0303301 15.70 0.000.4166288.5356652 x2 1.974713.1090362 18.11 0.000 1.760746 2.18868 x3.0072188.1083894 0.07 0.947.205479.2199165 _cons 8.906289 1.697041 5.25 0.000 12.23647 5.576103 (1) regress n(n 3) 2 rvfplot 3
. rvfplot, yline(0) * 2 (x 1i, x 2i,...) ŷ i y i y i ŷ i rvfplot (residual-versus-fitted plot) ŷ i 10 (2) R 2 regress regress ANOVA y SS (sum of squares) y i (i = 1,..., n) ȳ 2 y i ŷ i y (yi ȳ) 2 = (y i ŷ i ) 2 + (ŷ i ȳ) 2 (2) (yi ȳ) 2 TSS (total sum of squares) (ŷi ȳ) 2 MSS (model sum of squares) (yi ŷ i ) 2 RSS (residual sum of squares) TSS 160820.539 MSS 63294.2507 RSS 97526.2879 *2 Statistics Linear models and related Regression diagnostics Residual-versus-fitted plot 4
(coefficient of determination) R 2 = MSS TSS = 1 RSS TSS (3) 63294.2507/160820.539 = 0.3936 ANOVA R-squared 39% (3) p R-squared 1 p Prob > F (ANOVA: analysis of variance) F H 0 H 0 : β 1 = β 2 = = 0 β 0 (4) ANOVA Coef. (coefficients) β 1 = 0.48 β2 = 1.97 β3 = 0.01 β0 = 8.91 (1) (5) p p β j = 0 t p x 3 p 0.947 β 3 = 0 β 3 95% CI: confidence interval [ 0.21, 0.22] 0 x 3 x 1, x 2 p 0 regress p x 1 0.5 postestimation 1 test Statistics Postestimation Tests Test linear hypotheses Main : Test type for specification 1: Linear expressions are equal Specification 1, linear expression: x1 = 0.5 5
2 test - Main. test (x1 = 0.5) ( 1) x1 =.5 F( 1, 996) = 0.62 Prob > F = 0.4318 (6) Coef. (standard error) Std. Err. (standard deviation) x 1 95% CI (degrees of freedom) 996 t invttail(n, p) [D] functions. display invttail(996, 0.025) * 3 1.9623486 95% (critical value). display 0.476147-1.9623486*0.0303301.41662877. display 0.476147 + 1.9623486*0.0303301.53566523 x 1 95% CI [0.4166288, 0.5356652] *3 Data Other utilities Hand calculator 6
x 1, x 2 1.3 1.2 1,000 1/10 resampling seed seed. set seed 111 * 4 100. sample 100, count * 5 (900 observations deleted) 100 regress. regress y x1 x2 x3. regress y x1 x2 x3 Source SS df MS Number of obs = 100 F( 3, 96) = 15.43 Model 5157.47143 3 1719.15714 Prob > F = 0.0000 Residual 10693.3387 96 111.388945 R squared = 0.3254 Adj R squared = 0.3043 Total 15850.8101 99 160.109193 Root MSE = 10.554 y Coef. Std. Err. t P> t [95% Conf. Interval] x1.583658.1117937 5.22 0.000.3617492.8055668 x2 1.565079.3726188 4.20 0.000.8254362 2.304721 x3.1063396.3448411 0.31 0.758.7908438.5781646 _cons 12.28143 6.422559 1.91 0.059 25.03011.4672456 R 2 x 3 1.2 3 4 x 1 0.37 0.8 5% test *4 [R] set seed *5 Statistics Resampling Draw random sample 7
. test (x1 = 0.37). test (x1 = 0.37) ( 1) x1 =.37 F( 1, 96) = 3.65 Prob > F = 0.0590. test (x1 = 0.8). test (x1 = 0.8) ( 1) x1 =.8 F( 1, 96) = 3.74 Prob > F = 0.0559 p 0.05 (significance level) 5% Reporting Confidence level 3 2. Stata Example auto.dta. sysuse auto.dta * 6 (1978 Automobile Data) 1978 74 mpg (miles per gallon) weight 2 weight 2 2 c.weight#c.weight mwp-028. generate weightsq = weight^2 * 7 *6 File Example datasets Example datasets installed with Stata *7 Data Create or change data Create new variable 8
. format weightsq %10.0g * 8. list mpg weight weightsq in 1/5 * 9 mpg weight weightsq 1. 22 2,930 8584900 2. 17 3,350 11222500 3. 22 2,640 6969600 4. 20 3,250 10562500 5. 15 4,080 16646400 weight K 2 2. regress mpg weight weightsq * 10. regress mpg weight weightsq Source SS df MS Number of obs = 74 F( 2, 71) = 72.80 Model 1642.52197 2 821.260986 Prob > F = 0.0000 Residual 800.937487 71 11.2808097 R squared = 0.6722 Adj R squared = 0.6630 Total 2443.45946 73 33.4720474 Root MSE = 3.3587 mpg Coef. Std. Err. t P> t [95% Conf. Interval] weight.0141581.0038835 3.65 0.001.0219016.0064145 weightsq 1.32e 06 6.26e 07 2.12 0.038 7.67e 08 2.57e 06 _cons 51.18308 5.767884 8.87 0.000 39.68225 62.68392 mpg = 1.42e-2 weight + 1.32e-6 weight 2 + 51.18 mpg weight weight 2 10 4 weight 2 10 6 10 7 (normalize) 0 1 beta *8 Variables weightsq Format weightsq *9 Data Describe data List data *10 Statistics Linear models and related Linear regression 9
regress : Reporting : Standardized beta coefficients: 3 regress - Reporting. regress mpg weight weightsq, beta Source SS df MS Number of obs = 74 F( 2, 71) = 72.80 Model 1642.52197 2 821.260986 Prob > F = 0.0000 Residual 800.937487 71 11.2808097 R squared = 0.6722 Adj R squared = 0.6630 Total 2443.45946 73 33.4720474 Root MSE = 3.3587 mpg Coef. Std. Err. t P> t Beta weight.0141581.0038835 3.65 0.001 1.901918 weightsq 1.32e 06 6.26e 07 2.12 0.038 1.104148 _cons 51.18308 5.767884 8.87 0.000. beta weight weight 2 1.7 10
3. auto.dta weight length β 0 length = 0 β 0 β 0 = 0 noconstant Statistics Linear models and related Linear regression Model : Dependent variable: weight Independent variables: length Suppress constant term:. regress weight length, noconstant Source SS df MS Number of obs = 74 F( 1, 73) = 3450.13 Model 703869302 1 703869302 Prob > F = 0.0000 Residual 14892897.8 73 204012.299 R squared = 0.9793 Adj R squared = 0.9790 Total 718762200 74 9713002.7 Root MSE = 451.68 weight Coef. Std. Err. t P> t [95% Conf. Interval] length 16.29829.2774752 58.74 0.000 15.74528 16.8513 1 16.3 4. Example census9.dta. use http://www.stata-press.com/data/r11/census9, clear * 11 (1980 Census data by state) 1980 5 *11 File Example Datasets Stata 11 manual datasets Base Reference Manual [R] regress 11
. list state drate pop medage region in 1/5, nolabel * 12 state drate pop medage region 1. Alabama 91 3,893,888 29.30 3 2. Alaska 40 401,851 26.10 4 3. Arizona 78 2,718,215 29.20 4 4. Arkansas 99 2,286,435 30.60 3 5. California 79 23,667,902 29.90 4 (drate) (medage) (region) region Northeast, North Central, South, West 1, 2, 3, 4 drate medage pop mwp-027 4 regress - Weights Analytic weights Frequency weights Alabama 3, 893, 888 29.3 *12 region 12
Statistics Linear models and related Linear regression Model : Dependent variable: drate Independent variables: medage i.region * 13 Weights : Analytic weights: pop. regress drate medage i.region [aweight = pop] (sum of wgt is 2.2591e+08) Source SS df MS Number of obs = 50 F( 4, 45) = 37.21 Model 4096.6093 4 1024.15232 Prob > F = 0.0000 Residual 1238.40987 45 27.5202192 R squared = 0.7679 Adj R squared = 0.7472 Total 5335.01916 49 108.877942 Root MSE = 5.246 drate Coef. Std. Err. t P> t [95% Conf. Interval] medage 4.283183.5393329 7.94 0.000 3.196911 5.369455 region 2.3138738 2.456431 0.13 0.899 4.633632 5.26138 3 1.438452 2.320244 0.62 0.538 6.111663 3.234758 4 10.90629 2.681349 4.07 0.000 16.30681 5.505777 _cons 39.14727 17.23613 2.27 0.028 73.86262 4.431915 [aweight = pop] 5. regress 2 (OLS: ordinary least squares) OLS (homoskedasticity) Example auto.dta. sysuse auto, clear (1978 Automobile Data). replace weight = weight/1000 * 14 *13 i. mwp-028 *14 Data Create or change data Change contents of variable K 13
. regress mpg weight. regress mpg weight Source SS df MS Number of obs = 74 F( 1, 72) = 134.62 Model 1591.99024 1 1591.99024 Prob > F = 0.0000 Residual 851.469221 72 11.8259614 R squared = 0.6515 Adj R squared = 0.6467 Total 2443.45946 73 33.4720474 Root MSE = 3.4389 mpg Coef. Std. Err. t P> t [95% Conf. Interval] weight 6.008687.5178782 11.60 0.000 7.041058 4.976316 _cons 39.44028 1.614003 24.44 0.000 36.22283 42.65774. twoway (scatter mpg weight) (lfit mpg weight), ytitle(mpg) * 15 1 rvpplot (residual-versus-predictor plot). rvpplot weight, yline(0) * 16 *15 Graphics Twoway graph (scatter, line, etc.) *16 Statistics Linear models and related Regression diagnostics Residual-versus-predictor plot 14
rvpplot weight (heteroskedasticity) Statistics Postestimation Reports and statistics estat : Reports and statistics: Tests for heteroskedasticity (hettest) 5 estat 15
. estat hettest Breusch Pagan / Cook Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of mpg chi2(1) = 11.05 Prob > chi2 = 0.0009 estat hettest H 0 p 0.0009 H 0 OLS OLS regress SE/Robust SE (standard error) Statistics Linear models and related Linear regression Model : Dependent variable: mpg Independent variables: weight SE/Robust : Robust 6 regress - SE/Robust 16
. regress mpg weight, vce(robust) Linear regression Number of obs = 74 F( 1, 72) = 105.83 Prob > F = 0.0000 R squared = 0.6515 Root MSE = 3.4389 Robust mpg Coef. Std. Err. t P> t [95% Conf. Interval] weight 6.008687.5840839 10.29 0.000 7.173037 4.844337 _cons 39.44028 1.98832 19.84 0.000 35.47664 43.40393 vce(robust) OLS 95% CI OLS Robust weight [ 7.04, 4.98] [ 7.17, 4.84] cons [36.22, 42.66] [35.48, 43.40] SE/Robust Clustered robust 17
5 Graphics Twoway graph (scatter, line, etc.) Plots Create Plot 1 Choose a plot category and type: Basic plots Basic plots: Scatter Y variable: mpg X variable: weight Plots Create Plot 2 Choose a plot category and type: Fit plots Fit plots: Linear prediction Y variable: mpg X variable: weight Y axis : Title: mpg. twoway (scatter mpg weight) (lfit mpg weight), ytitle(mpg) 18