3.0 REGRESSION Command The B34S REGRESSION command supports a comprehensive regression option which includes GLS models and various BLUS and other tests of equation specification. If only OLS is desired, look at REG, RR or QR commands. The REG command gives a number of options, especially with pooled data. If large number of lags are desired, the REG command does not require that the lags be explicitly built. Balanced error component models can be estimated with the ECOMP command or, for fixed effects models, with the panel_lib routines in staging2.mac. A basic reference for some of the equation specification testing in the REGRESSION commandis: - Theil, H., Principles of Econometrics, Wiley 1971. Form of the REGRESSION command: B34SEXEC REGRESSION options parameters $ MODEL Yvar = Xvar1 Xvar2 Xvar3 $ COMMENT=(' ') $ ORDER Varn1 Varn2 ... $ RA options parameters $ DM var1 var2 $ B34SEEND$ The MODEL sentence is required. REGRESSION sentence options. NOINT - Will estimate model without an intercept, otherwise an intercept will be used. STEPWISE - Gives stepwise output. If stepwise is not given, only the last step is printed. RESIDUALA - Gives residual analysis. This is recommended. RESIDUALP - Gives residual analysis and plots. If GLS and or BLUS residuals are calculated, the options NOBLUSPLOT and NOGLSPLOT will turn off plots for BLUS and GLS residuals respectively. NOBLUSPLOT - Turns off BLUS residual plots if RESIDUALP has been set. NOGLSPLOT - Turns off GLS plots if RESIDUALP has been set. PUNCHRES - Gives residual analysis and plot and punches on unit 37 RESIDUAL, NPROB, NSUB, KOUNT in format (E15.8,5X,3I5) SPUNCHRES - Places residuals on unit 44 in SCA FSAVE format to a file with name RESIDUAL. Series listed are OBSNUM, Y, YHAT, RESIDUAL. NPROB and NSUB are listed as comments in the file. File is not rewound prior to saving. Use SCAINPUT to rename this file if SPUNCHRES is used for subsequent REGRESSION commands. If this is not done, only first file will be read. The regression model is listed in comment cards. ZPUNCHRES - This option is no longer supported. MANYDIGITS - Will give extra digits for regression coefficient printing. PCOEF - Will output coefficients on unit 37. Note: MANYDIGITS option must not have been set if PCOEF is set. Format used is: 'REGRESSION COEFFICIENTS',I5,2I6 which passes NPROB,NSUB,IGLS. Next is placed the subheader card YNAME ICODE VAL1 VAL2 ESS TSS DFDR TDF NPROB NSUB using format (A8,I3,4E12.5,I4,I7,I3,I6) where ICODE=0, VAL1=VAL2 = 0.0, ESS = explained sum of squares, TSS = total sum of squares, DFDR = degrees of freedom regression, TDF = total degrees of freedom. Subsequent cards contain XNAME ICODE COEFF SE PCOR ELAST ID1 ID2 NPROB NSUB using same format. ID1=ID2=0. If multiple REGRESSION command have the PCOEF command, all coefficients will be saved in the same file. NPROB and NSUB can be used to determine which coefficients go with which regression. If the constant is forced into the regression the values of PCOR and ELAST will be set to 0.0. After the coefficients are saved, another header/trailer card is placed in the file. By the use of header/trailer cards unit 37 can be concurrently be used to save coefficients covariance matrices for many regressions in one file. Users are encouraged to see the GENMOD command for an example where the unit 37 file is used. PCOV - Will output covariance matrix of regression coefficients on unit 37. Header and trailer card of the form 'id. text' NPROB,NSUB,IGLS using format ('COV. MATRIX OF COEF ',I5,2I6) The covariance matrix is punched by rows in lower triangular form using format (3G24.16). NOCOV - Will delete covariance matrix of regression coefficients. BRTEST - Prints sum of adjusted residuals, mean adjusted residual, and other tests useful in BLUS analysis. REGRESSION sentence parameters. MAXGLS = n - Sets maximum order of GLS estimation using Goldberger method. B34S will estimate up to order n, depending on value of TOLG. The GLSGRID command is an alternative to the MAXGLS approach. If Bayes analysis is performed and the max value of MAXGLS is greater than 1, it will be reset to 1. TOLG = r1 - Sets the convergence tolerence for smoothing data. This tolerence is applied to the maximum of the absolute value of the first n autocorrelation coefficients. If this tolerence is not specified, the program sets it to (1.0 / sqrt(NOOB)) which is an estimate of the SE of the autocorrelation coefficient. If in core BLUS analysis is done, the last set of BLUS residuals are used in place of the OLS residuals to check if GLS should be done. If the heteroskedasticity BLUS base was used, this may not be appropriate. To force GLS set TOLG=.1E-9. NTAC=n1 - Set the number of terms in the autocorrelation function of the residuals. The range is 1-30. The default is MAX(4,(MAXGLS+1)) BLUS= key - Sets out of core BLUS residual option. FIRST - Use the first K observations as BLUS base. MIDDLE - Use middle K observations as the base (this is the best base to test for heteroskedasticity). LAST - Use the last K observations as the base. (N1,N2,...,NK) specifies the BLUS observation base. User must specify K distinct observations. BEST - Allow the program to choose from among the K+1 possible adjacent bases by choosing that base which maximizes the sum of square roots of the eigenvalues. The K+1 possible bases are 1 First K and last 0 observations. 2 First K-1 and last 1 observations. . HET - Choose K observations at equal intervals from the middle third of the sample. BOTH - Calculate BLUS residuals first using option 5, then option 2. Note: The RA card allows for incore BLUS options which support sorted data. Currently both incore and out of core BLUS options have a limit of 20 variables on the right of any equation. TOLL=r2 Checks prospective variables for multicollinearity with variables presently in the equation (via inspection of the reduced diagonal element) by calculating a regression of prospective variables against all presently included variables. If 1 - this Rsquare is less that TOLL (whose default = .00001), the prospective variable will not enter. In addition a computational error estimate is presented. Users lower TOLL at their own risk. EFIN=r3 Minimum F level for inclusion of a variable. Default=.01. FOUT=r4 Minimum F level for variables in the equation before they will be thrown out. Default=.005. BAYES=key Will give Bayesian regression output. key=BAYPLOT Will plot Bayesian output. key=BAYLISTP Will list and plot Bayesian output. NBE=n2 Sets number of points in plotting grid for Bayes. Default = 5. Max = 100. NRO=n3 Sets number of points for plotting Bayes estimate of p. Default = 0 (marginal is suppressed). Max =100. RLO=r5 Lower limit of integration for estimating p by Bayesian methods. Default = p - 4(T-K)**(-.5) RHI=r6 Upper limit of integration for estimating p by Bayesian methods. Default = p + 4(T-K)**(-.5) NRS=n4 # of points in plotting grid for R**2 estimated with Bayesian methods. Max= 100. If the cumulative density does not sum to 1.0, increase NRS. ILDPV=Variable Sets the name of the lagged dependent variable to be used in the calculation of the corrected DW test. Note: The Bayesian regression options are very computer intensive and should be used with care. The R**2 calculation takes time. The number of points used increases accuracy and computational cost. If MAXGLS > 0 and BAYES is set, Bayesian analysis will be done on GLS only and the maximum of MAXGLS will be 1. The Bayes option is experimental. Uses are warned to use caution in the interpretation and use of the results. GLSGRID=n5 Perform a GLS grid search in n5+1 steps between PHO and PHI. The maximum number allowed is 99. If n5=0, GLS is performed using PHO. PHO=r6 Lower limit for GLS grid search. Default=.4. The maximum number of digits is 3. PHI=r7 Upper limit for GLS grid seaerch. Default=.95. The maximum number of digits is 3. NUMPROB=n6 Sets problem # for identification purposes only. Valid values must be in the range 0-999. MODEL sentence. MODEL Y = X1 X2 X3 $ Where Y is the left hand variable and X1 X2 X3 are right hand variables. Y, X1, X2 X3 must have been passed to B34S. The maximum number of right hand variables = 68. ORDER sentence. ORDER Xi Xj $ Specifies the order of variables that must be in the equation. This option is useful only with the STEPWISE option. If it is desired to force in the constant, use the name CONSTANT on the ORDER sentence. COMMENT Sentence. COMMENT=(' ') $ The COMMENT sentence allows printing of a regression comment. Place comment (up to 72 characters) between (' and ') . The delimiter character ($) or the keywords B34SEXEC or B34SEEND must not be placed in the comment. There is no limit on the number of COMMENT sentences. RA sentence. The RA sentence allows calculation of specialized equation specification tests to test for the correct functional form of the equation. Options on the RA sentence. GRAPH - Graphs the residual against appropriate X variable. LIST - Lists resorted residual against first X variable only. LISTA - Lists resorted residual against all X variables. CROSS - Performs Cross correlation analysis on OLS residuals only. For this option the number of series listed with the VARS parameter must be even. AUTO - Performs Autocorrelation analysis with OLS residuals. NONONLIN - Turns off the nonlinearity tests for OLS residuals. Parameters on RA sentence (Required VARS) VARS=(X1,X2,...,Xk) specifies the variables to test for misspecification. Max = 8. RESID=KEY where KEY is set to determine what residuals are to be used. Options for KEY include. OUTBLUS = BLUS residuals from REGRESSION out of core BLUS procedure. OUTBLUS would be used with cross section data that would not fit in core OLS = OLS out of core residuals used. This is the default value for KEY. ALL = Both OUTBLUS and OLS residuals used. Note: the following 4 options instruct the RA option to calculate INCORE BLUS residuals for various tests. These tests are the most powerful available in the RA option. For further detail, see Theil (1971) Chapter 5. Currently there is a limit of 20 variables on the right on any equation for which incore BLUS tests are requested. CONVEX = Calculate BLUS residuals for convexity test using the MVN ratio tests. HET = Calculate BLUS residuals for heteroskedasticity test. PARAB = Calculates the parabola convexity test. ALLBLUS = Calculates CONVEX, HET and PARAB tests. DIF=n Sets differencing. This option should only be used if BLUS residuals are not calculated (RESID=OLS). n=0 No differencing. This is the default. n=1 Up to first differencing. n=2 Up to second differencing. n=3 Only first differencing. n=4 Only second differencing. PERIOD=j Number of periods of autocorrelations and cross correlations. The max value of PERIOD = 60. If PERIOD is not set, B34S sets to maximum of 60 and (NOOBS/4). Examples of RA card. Battery of BLUS tests done on Model b34sexec regression residuala$ comment=('Incore blus tests done on Model')$ model y = x1 x2$ ra resid=allblus vars=(x1,x2)$ b34seend$ Residuals Autocorrelated b34sexec regression residuala$ comment=('Incore blus tests done on Model')$ model y = x1 x2$ ra resid=ols auto vars=(x1,x2)$ b34seend$ DM sentence: The DM sentence is optionally used with the BAYES parameter to delete the marginals for selected variables. The form of the DM sentence is DM Var1 Var2 $. A maximum of 30 variables can be specified. Sample setup. B34SEXEC REGRESSION$ MODEL Y = X1 X2 X3 $ * runs 3 variable model $ B34SEEND$ More complex setup showing GLS B34SEXEC REGRESSION MAXGLS=3 STEPWISE$ MODEL Y = X1 X2 X3 X4 $ COMMENT=('Main Model') $ ORDER CONSTANT X1 $ B34SEEND$