44.0 ROBUST Command The ROBUST command allows estimation of models where lags of the variables do not have to be explicitly set and OLS, L1 and MINIMAX estimators can be calculated. Unlike the REGRESSION command, the REG, RR, and ROBUST commands load data into memory. The size of the largest problem is limited by the size of memory that can be allocated. The ROBUST command allows panel data models which are not rectangular to be estimated by use of an identifier variable that may be a character variable. The ROBUST command allows saving of the estimated coefficients, e'e, number of observations, R**2, largest residual and sum of absolute values of residuals in a DMF file along with an identifier variable. Residuals can also be saved in an SCA FSAV file. The ROBUST command allows estimation of models for the complete sample in two important situations: without (usual case) and with panel data. With panel data, B34S will automatically handle the deletion of the appropriate number of observations to handle lags as the estimation moves across the panel. The ROBUST command uses modified routines originally obtained from version 8 of the IMSL library. L1 estimators are not as sensitive to outliers as OLS. MINIMAX estimators minimize max(abs(y(i)-yhat(i))) and are more sensitive to outliers than OLS. L1 estimators minimize sum(abs(y(i)-yhat(i))). While by it's very construction, e'e for OLS is LE all other linear estimators, the coefficients of OLS are subject to bias due to outliers. In this implementation of the ROBUST command only L1 and MINIMAX coefficients are produced. If SE type measures are desired for L1, it is suggested that the user consider STATA. The ROBUST command is a special case of the REG command designed to explore alternative estimators. The ROBUST command supports the BISPEC sentence which allows a number of tests for nonlinearity, autocorrelation and ARCH etc. If only OLS is desired, use the REG, REGRESSION or QR commands. The ROBUST command can produce recursive beta estimates by either adding a observation at a time or using a moving window. This feature produces OLS Beta's SE's, L1 Beta's and MINIMAX Beta's but is substantially slower than the RR command. If results on sorted data are desired, use the SORT command prior to the ROBUST command. Good references for L1 and MIMIMAX are: 1. Barrodale, I. and Roberts, F. D. K. "An improved algorithm for discrete L1 linear approximation," SIAM Journal of Numerical Analysis 10, 1973, 839-848. 2. Barrodale, I. and Roberts, F. D. K., "Solution of an overdetermined system of equations in the L1 norm," Communications of the Association for Computing Machinery, 17, 1974, 319-320. 3. Barrodale, I. and Phillips, C. "Algorithm 495. Solution of an overdetermined system of linear equations in the Chebyshev norm," ACM Transactions on Mathematical Software 1(3) 1975, 264-270. 4. Bassett, G. W. and R. W. Koenker. "The Asymptotic Theory of Least Absolute Error Regression." Journal of the American Statistical Association 73 1978 618-622 5. Bassett, G. W. and R. W. Koenker. "A Note on Recent Proposals for Computing L1 Estimates." Computational Statistics & Data Analysis 14 1992 207-211 6. Huber, P. J. "Robust Regression: Asymptotics, Conjectures and Monte Carlo." Annals of Statistics, 1 1973 799-821. Computational Notes. Rats has a Robust procedure listed on page 5-11 that uses weighted least squares to obtain L1 estimates following the approach discussed by Huber (1973). Bassett & Koenker (1992) give a theoretical discussion on why such methods will not work, especially in the case where there are outliers and L1 is really needed. The L1 example in the RATSPGM.MAC file illustrates this problem. Inspection of the output shows that sum(abs(e(t))) for the ROBUST command is much less than what is found with the Huber (1973) approach. The call olsq command under MATRIX preovides an alternative to this command. The general form of the ROBUST command is: B34SEXEC ROBUST options parameters$ MODEL Yvar = Xvar1 Xvar2 $ BISPEC options parameters$ TRISPEC options parameters$ POLYSPEC options parameters$ REVERSE options parameters$ B34SEEND$ ROBUST options: NOOLS - Suppress OLS model listing that gives SE's. NOL1 - Suppress L1 model estimation. NOMM - Suppress minimax model estimation. NOINT - Suppress constant. PRINT - Print panel or recursive results. CPRINT- Prints panel or recursive results without a new page for each panel to save space. RESIDUALP-List residuals with lineprinter plot for complete sample. PANEL - Data is in panel form. If data is in rectangular form, NREG must be set or SUBKEY must be set. It is assumed that the data is in the form of observations for subset1 , subset2 ... If this is not the case, use the SORT command to put the data in the correct form prior to running REG command. If the data is NOT in rectangular form, SUBKEY must be used to delinate the panels. SAVERES - Saves the residuals is an SCA FSAVE file on unit FSAVUNIT. For the complete sample the FSAV dataset name is RESIDUAL. For panels, the default is RES0001... The residual is saved as RESIDUAL along with OBSNUM Y and YHAT. The L1 data is saved as L1RES and L1YHAT. The minimax data is saved as MMRES and MMYHAT. The file is not rewound prior to saving. Use SCAINPUT command to rename these files. The keyword SPUNCHRES can be used in place of SAVERES. For panel data SAVERES takes a great deal of time doing I/O. SAVECOEF- Saves Panel coefficients and associated statistics in a DMF file. The default dataset name is PCOEF. The panel regression number is saved in IDENT. If a SUBKEY is specified, it is saved. The DMF unit is COEFUNIT. The OLS coefficients are saved with names BETA0001 BETA0002 BETA0003. Linkages between these names, which are needed because of the possibility of lags, and the underlying variables are listed in variable labels. e'e, R**2 N, variance Y and the durbin watson values are saved with names EPE, RSQ, NOOB, and YVAR. L1 coefficients are saved with names L1BT0001 L1BT0002 while minimax coefficients are saved with names MMBT0001 MMBT0002. SAVERCOEF Saves recursive coefficients and associated statistics in a DMF file. The default dataset name is MCOEF. IBEGIN and IEND show the observations used. The DMF unit is COEFUNIT. The OLS coefficients are saved with names BETA0001 BETA0002 BETA0003. Linkages between these names, which are needed because of the possibility of lags, and the underlying variables are listed in variable labels. e'e, R**2 N, variance Y and the durbin watson values are saved with names EPE, RSQ, NOOB, and YVAR. L1 coefficients are saved with names L1BT0001 L1BT0002 while minimax coefficients are saved with names MMBT0001 MMBT0002. Forecasts for 0, 1 and k periods ahead are YHAT_0 YHAT_1, YHL1_0, YHL1_1, YHMM_0, YHMM_1 etc. Actual data are Y Y_1 etc. At the end of the file missind data are saved for Y_1 etc. The b34s DESCRIBE command handles missing data and can be used for summary measures on how well the model is working. RECURSIVE Does recursive analysis but does not save the coefficients. This option only makes sense if PRINT or CPRINT is in effect. The RECURSIVE command can be used in NBEGIN points to a value near the end of the dataset. ONLYSUB - Specifies that only subsample regressions will be calculated. This option is only used with PANEL data and will save space since the complete dataset will not be loaded. ONLYFULL- Specifies that OLS models on the complete dataset are to be run for panel data but that panel regressions are not going to be run. DMF - Sets the DMF save format as UNFORMATTED. This is the default. This can also be set as FILEF=DMF. Note that is DMF is used, must allocate the DMF file as unformatted. FDMF - Sets the DMF save format as FORMATTED. This makes a more portable file but requires more time and makes files that are 3 times bigger. This can also be set as FILEF=FMDF. FORECAST Will produce forecasts for the complete period. Note: The ROBUST command will start writing DMF files at the position of the file. If you wish to add to files already on the DMF file, use the POSITION( ) parameter which is documented in the OPTIONS command. If the desire is to reuse the file, the CLEAN( ) command should be used. ROBUST parameters: IBEGIN=n1 - If the dataset is not panel, sets the first observation to use in the analysis. If the dataset is a panel, sets first observation to use in the panel. IEND=n2 - If the dataset is not panel, sets the last observation to use in the analysis. If the dataset is a panel, sets last observation to use in the panel. NBEGIN = n3 - Sets number of observations for recursive coefficients model. Default = K+1. MWINDOW =n4 - Sets number of observations for moving regressions. MWINDOW must be in range K*3 to NOB-k. NREG=n3 - Number of observations in each region (sub regression). SUBKEY=Vname - Sets variable, possible character, that identifies the subregression for panel data. DMFUNIT=n4 - Sets the DMF coefficient save unit number. The default is 60. DMFNAME=k - Sets the DMF coefficient dataset name. The default is PCOEF. The keyword DMFMEMBER can be used in place of DMFNAME. Up to 10 characters can be specified. DMFMNAME=k - Sets the DMF coefficient dataset name for moving regression. The default is MCOEF. Up to 10 characters can be supplied. Note: NBEGIN, MWINDOW, DMFMNAME,SAVERCOEF are used with recursive models. SUBKEY, DMFNAME, SAVECOEF and PANEL are used with panel models. Either one or the other has to be used, not both at the same time. Note: The following parameters set frequency and starting dates for DMF files SETFREQ(R) - Sets base frequency. 1. = annual data. .1 = data once per decade. R can be set as real OR integer. If SETFREQ is passed -1, the Julian internal date is reset to unused. SETYEAR(NN) - Sets base year for annual data. Frequency assumed =1. SETMY(M1,Y1) - Sets base year for monthly data. Frequency assumed =12. SETQY(Q1,Y1) - Sets base year for quarterly data. Frequency assumed = 4. SETDMY(D1,M1,Y1) Sets base year for daily data. Frequency assumed =365. FSAVUNIT=n5 - Sets the SCA FSAV residual save unit. The default is 44. DMFUNIT and FSAVUNIT cannot be set to the same unit. FSAVNAME=k - Sets the SCA FSAVE residual dataset name. For the complete sample, the name is RESIDUAL. For panels the default is RES0001. The keywork FSAVMEMBER can be used in place of FSAVNAME. CCOMMENTS(' ',' ') - Sets comments for the DMF file saving coefficients. Any number of 72 col comments can be supplied. RCOMMENTS(' ',' ') - Sets comments for the FSAV file saving residuals. Any number of 72 col comments can be supplied. The MODEL sentence is required. If PANEL is not in effect, the Hinich tests which are called by the BISPEC, TRISPEC and POLYSPEC commands can be used. MODEL sentence. MODEL Y = X1 X2 X3 X4$ The MODEL sentence lists the left hand variable and the right hand side variables. Unless NOINT is supplied, a constant will be automatically added to the model. In addition to the usual specification, the MODEL sentence in the REG command allows the lags to be set in the command. The command MODEL Y = Y{1} X{0 to 3} Z{1}$ is the same as MODEL Y = LAGY X LAG1X LAG2X LAG3X LAG1Z$ except that in the former case the lag variables do not have to be built. The advantage of this setup is that the 98 variable limit of B34S is effectively lifted if the added variables are lags. BISPEC sentence. The BISPEC sentence performs various nonlinearity, gaussianity and matringale tests suggested by Hinich. The form of the BISP sentence in the BTIDEN, BTEST and MARS commands is the same. To save space, detail for this sentence is only given under the BTIDEN command help file. If the BISPEC sentence is given with no options or parameters, gaussianity and nonlinearity tests will be performed using default settings. The setting BISPEC IAUTO ITURNO $ will perform tests for gaussianity and nonlinearity over a grid of admissable values for the bandwidth. TRISPEC sentence The TRISPEC command performs 4th order nonlinearity tests suggested by Hinich. Further detail on this sentence is listed under the BTIDEN command. POLYSPEC sentence The POLYSPEC command performs various nonlinearity tests suggested by Hinich within the sample. Further detail on this sentence is listed under the BTIDEN command. REVERSE sentence The REVERSE sentence performs various Time reversability tests suggested by Hinich and Rothman. Further detail in this sentence is listed under the BTIDEN command. Examples. 1. User wants to run a models on the complete sample and do nonlinearity tests. Autocorrelations of the residuals are performed using the ACF( ) parameter of the BISPEC sentence. b34sexec robust$ model y= x z{1 to 20}$ bispec iturno iauto acf(24)$ b34seend$ 2. User wants to run regression subsamples that are marked by the variable STOCK. Output of the regression is saved in DMF file myruns.dmf with name of runone. A formated dmf file is being used and any data in the file is erased prior to the run. The saved betas are reread into b34s and the results are sorted and the lowest 200 betas listed. Residuals are also saved. b34sexec options open('c:myruns.dmf') unit(60) disp=unknown$ b34seend$ b34sexec options clean(60)$ b34seend$ b34sexec options open('c:myres.fsv') unit(44) disp=unknown$ b34seend$ b34sexec options clean(44)$ b34seend$ b34sexec robust dmfunit=60 dmfmember=runone fdmf fsavunit=44 fsavname=rone panel subkey=stock savecoef saveres$ model y= x z{1 to 20}$ b34seend$ b34sexec data fdmf dmfmember=runone$ input ident beta0001 beta0002 se000001 se000002 rsq epe dw rsq noob$ b34seend$ b34sexec sort$ by beta001$ b34seend$ b34sexec list iend=200$ b34seend$ Using an unformatted dmf file the above job would be b34sexec options open('c:myruns.dmf') unit(60) disp=unknown form=unformatted$ b34seend$ b34sexec options clean(60)$ b34seend$ b34sexec options open('c:myres.fsv') unit(44) disp=unknown$ b34seend$ b34sexec options clean(44)$ b34seend$ b34sexec robust dmfunit=60 dmfmember=runone dmf fsavunit=44 fsavname=rone panel subkey=stock savecoef saveres$ model y= x z{1 to 20}$ b34seend$ b34sexec data filef=dmf dmfmember=runone$ input ident beta0001 beta0002 se000001 se000002 rsq epe dw noob$ b34seend$ b34sexec sort$ by beta0001$ b34seend$ b34sexec list iend=200$ b34seend$ 3. User wants to run a model on the complete sample and only with L1. b34sexec robust nools nomm$ model y= x z{1 to 20}$ b34seend$ 4. User wants to run a model on the complete sample and forecast. b34sexec robust forecast$ model y= x z{1 to 20}$ b34seend$ 5. User wants to run a model on the complete sample, forecast and produce moving coefficients. b34sexec options open('c:myruns.dmf') unit(60) disp=unknown$ b34seend$ b34sexec options clean(60)$ b34seend$ b34sexec robust forecast savemcoef$ model y= x z{1 to 20}$ b34seend$ 6. User wants to run a model on the complete sample, forecast and produce moving coefficients with window of 100. b34sexec options open('c:myruns.dmf') unit(60) disp=unknown$ b34seend$ b34sexec options clean(60)$ b34seend$ b34sexec robust forecast savemcoef mwindow=100$ model y= x z{1 to 20}$ b34seend$