___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 13.1 Copyright 1985-2013 StataCorp LP Statistics/Data Analysis StataCorp 4905 Lakeway Drive Special Edition College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 stata@stata.com 979-696-4601 (fax) Single-user Stata perpetual license: Serial number: 401306001168 Licensed to: Houston H. Stokes Econometric Software and Consulting Notes: 1. (/v# option or -set maxvar-) 5000 maximum variables 2. Stata running in batch mode . do error_in_var.do . // . // Implements "Errors in Variables" Problem . // Stata implementation from b34s code . // Suggested by Helen Roberts to illustrate bias example . // . // Problem found especially in testing of permanent income hypothesis . // . // True Model y = a + b*x + e . // Estimated Model y = a + bb*xstar + ee . // where xstar = x + gamma . // . // gamma is a random measurement error . // . // Note: true x= xstar-gamma . // . // Model estimated can be written in terms of xstar as . // y = a+ b*(xstar-gamma) + e . // = a+ b*xstar + (e-b*gamma) . // . // => error (e-b*gamma) is related to xstar. But OLS . // constrains this correlation to be 0.0 by assumption. . // => bias in b . // . // Theil shows (in eq 2.7 on page 608) . // . // plim bb(n goes to inf) = b((sigma(x)/(sigma(gamma)+sigma(x)) . // . // => greater the variance of gamma=> the lower the estimated b is . // with respect to the true beta . // . // See Theil (1971, Page 608-610) . // . // Note: Example shows a test case that uses 50000 cases to demonstrate . // Theil Formula . // . set obs 50000000 obs was 0, now 50000000 . gen mult = 4 . gen coef = 5.1 . gen error =rnormal() . gen x =rnormal() . gen y =1. + coef*x + rnormal() . gen gamma = mult*rnormal() . gen xstar = x+gamma . regress y x Source | SS df MS Number of obs =50000000 -------------+------------------------------ F( 1,49999998) = . Model | 1.3002e+09 1 1.3002e+09 Prob > F = 0.0000 Residual | 49993091.449999998 .999861868 R-squared = 0.9630 -------------+------------------------------ Adj R-squared = 0.9630 Total | 1.3502e+0949999999 27.003679 Root MSE = .99993 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 5.100243 .0001414 3.6e+04 0.000 5.099966 5.10052 _cons | 1.000129 .0001414 7072.47 0.000 .9998514 1.000406 ------------------------------------------------------------------------------ . regress y xstar Source | SS df MS Number of obs =50000000 -------------+------------------------------ F( 1,49999998) = . Model | 76491297.3 1 76491297.3 Prob > F = 0.0000 Residual | 1.2737e+0949999998 25.4738536 R-squared = 0.0567 -------------+------------------------------ Adj R-squared = 0.0567 Total | 1.3502e+0949999999 27.003679 Root MSE = 5.0472 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- xstar | .299946 .0001731 1732.84 0.000 .2996068 .3002853 _cons | .9993755 .0007138 1400.12 0.000 .9979765 1.000774 ------------------------------------------------------------------------------ . summ x Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x | 50000000 -.0001247 .9998335 -5.741539 5.97508 . gen var_x=r(Var) . summ gamma Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- gamma | 50000000 .0005157 4.000475 -23.39588 24.43228 . gen var_gamma=r(Var) . gen test = coef*(var_x/(var_gamma+var_x)) . summ test Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- test | 50000000 .299839 0 .299839 .299839 . end of do-file