13.0 DATA Command Revised April 2003 Overview of capability. The DATA command loads data in B34S using observation by observation reading for up to 98 series. If data is saved variable by variable, use the READVBYV command, which is documented in section 43. The SCAIO command, documented in section 53, reads and writes SCA MAD files while the SCAINPUT command, documented in section 31, reads and writes SCA FSAVE files or reads and writes RATS(r) POR files. If more than 98 series are needed, another way to load data into B34S is to read the DATA into the MATRIX command and generate a B34S DATA step for a subset of series. A still further option is to save series in a dmf file and read an extract of the series in the dmf file from the b34sexec data command. Note that the MATRIX command can read and write Speakeasy portable checkpoints and Matlab save files. Since the SAS(r) DATA step has substantial capability to build and manage data files, B34S can be run under SAS(r) rather than stand alone using the DATA command. The SAS to B34S interface works on the Windows, Linux, RS/6000 and Sun versions of B34S using supplied SAS Macros. B34S DATA steps can be generated using SAS on platforms where B34S does not run and ported to another platform which runs B34S. The B34S DATA step can be used to load a subset of data from a B34S DMF (Data Management Facility) file. For further detail see section 13.7. The B34S DATA command reads all data in observation by observation format which is the preferred way. If the data is only available in variable by variable format, the READVBYV command should be used to load the data. Data loaded by this command can be modified by a subsequent B34S DATA command using the SET statement. Form of DATA command: B34SEXEC DATA options parameters $ INPUT Var1 Var2 $ BUILD Vark Varj $ CHARACTER Vark Varj $ FORMAT=(' ') $ LABEL Var1=' ' var2=' ' $ RENAME Oldname=newname $ BANK options parameters $ GEN XNEW = FUNCTION( ) $ COMMENT=('This is a comment')$ DATACARDS $ (note: data cards are here) Note: that data cards are placed here. PGMCARDS $ B34SRETURN $ B34SEEND $ Overview of sentences in the DATA command. INPUT - Specify variables to be loaded. BUILD - Specify names of variable to be built. CHARACTER - Specify variable loaded is Character*8. FORMAT - Specify Fortran format if filef=fixed is in effect. NOOB should be set. LABEL - Optionally supply a 1-40 character label. RENAME - Allows a variable to be renamed. Used if b34sexec data set; is being used to read a current dataset and further process the data. BANK - Access Wharton-Styule Bank. Not used much. GEN - Build a variable. COMMENT - Supply a comment to identify data step. DATACARDS - Header for reading data. PGMCARDS - Header for reading data, data will echo in log. B34SRETURN - Footer for data. B34SEEND - End of B34S Dtat loading ste. B34SRUN$ can also be used. Note: If not reading data from a DMF file either the INPUT sentence or the BANK sentence are required unless the READCROSS option has been specified. If the BANK sentence is specified, the GEN sentence cannot be used in the same DATA paragraph since the variable names are not known to the system at parse time. The BANK command allows data loading from a specific data bank format supported by Wharton Econometrics. This command is not used by most users and may be removed and or modified at a later date. If data is not read, only generated, the INPUT sentence is not needed BUT the BUILD sentence is required. DATA command examples for common applications. The simplest way to load two series x and y into b34s is: b34sexec data$ input x y$ datacards$ 11 22 33 44 55 66 b34sreturn$ b34seend$ If a variable z is desired such that z=2*x*y, then use b34sexec data$ input x y$ build z$ gen z=2*x*y$ datacards$ 11 22 33 44 55 66 b34sreturn$ b34seend$ DATA sentence options. SET - Will load the current data set and allow adding new generated variables. For usage see example 7 under OPTIONS or the simple example where z is built from variables x and y in a previously built dataset. b34sexec data$ input x y$ datacards$ 11 22 33 44 55 66 b34sreturn$ b34seend$ b34sexec data set$ build z$ gen z=x*y$ b34srun$ CORR - Output correlation matrix of variables. If a variable has no variance, the correlation between this variable and all other variables is assumed to be 0.0. A variable is assumed to be correlated 1.0 with itselt even in the case when the variable has no variance. These two conventions are in contrast to the approach that sets the correlation to missing() if any of the variables in the calculation have zero variance. Note: The CORR command is intended to be used for a quick, fast and compact look at the correlation matrix. Due to these constraints for extream cases there can be accuracy losses. For high accuracy calculations the data should be loaded into the MATRIX command and the CCF command used. COV - Output variance covariance matrix. NOTIME - Suppress timing data. REWIND - Will rewind unit specified with UNIT prior to reading data. If input file has been set with FILE(' ') unit 10 will be used. TIME - Give timing information. (This is the default). NOHEAD - Suppress first two pages of B34S output to save paper. This is the default. HEAD - Gives first two pages of output. This is needed if LIST=key option is used. NOCONSTANT - Suppresses automatic constant creation. KEEPMISS - Keeps missing data in the sample unless explicitly removed. For further information see section 1.22. This option is the default unless explicitly changed by DROPMISS. This switch is usually set on the OPTIONS sentence. DROPMISS - Drops all observations read containing missing data. For further detail see section 1.19. This switch is usually set on the OPTIONS sentence. READMISS - Sets the default input format on the DATA paragraph to FILEF=@@. See section 1.19 and 13. This switch is usually set on the OPTIONS sentence. DNREADMISS - Sets the default input format on the DATA paragraph to FILEF=FREE. See section 1.19 and 13. This switch is usually set on the OPTIONS sentence. WRITECROSS - Writes cross products and variable names on unit 35 to be read by subsequent READCROSS option. If WRITECROSS has been specified, WRITETRANS cannot be used. Unit 35 must have been allocated as formatted. WRITETRANS - Write data on unit 37 in format (I6, I2, 6E12.5) where I6 gives the observation number and I2 gives the card #. This option is rarely used. READCROSS - Reads cross products and variable names from a previous run off file on unit UNIT. If this option is set, user needs to input NOOB, NVAR and UNIT. No further options or parameters need be set. If this options is used ONLY the REGRESSION command can be used. Other commands will give unpredictable results since no data has been saved. Note: The WRITECROSS and READCROSS options only allow running very simple regressions using REGRESSION command. The REGRESSION command will not support the RESIDUALA and the RESIDUALP options since no data is available. All other B34S commands will not work since they require the raw data. The cross product options are only useful if very very large data sets are to be analysed at very lost cost using OLS. The READCROSS option checks the NOOB and NVAR values supplied against those saved when WRITECROSS was given. It is imperative that any DATA command using the READCROSS option terminate with the B34SRUN$ sentence in place of the B34SEEND$ sentence to initialize variable names for subsequent REGRESSION paragraphs. The cross product file has been made portable across machines. The header card gives the time and date when this file was made. The below two jobs show use of this facility. In recent years this facility has not been used much. Its value might be in a case where one had 10,000,000 or so observations and wanted to experiment with different models and only build the moment matrix once! Example 1: Making a cross product file: /$ Tests writing of cross products b34sexec options open('crossp.dat') unit(35) form=formatted disp=unknown$ b34seend$ b34sexec options clean(35)$ b34seend$ b34sexec options include('c:\b34slm\gas.b34')$ b34srun$ b34sexec data set maxlag=3 writecross heading=('gas data') $ build l1gasin l2gasin l3gasin l1gasout l2gasout l3gasout$ gen l1gasin=lag1(gasin)$ gen l2gasin=lag2(gasin)$ gen l3gasin=lag3(gasin)$ gen l1gasout=lag1(gasout)$ gen l2gasout=lag2(gasout)$ gen l3gasout=lag3(gasout)$ b34seend$ b34sexec regression$ model gasout=gasin l1gasin l2gasin l3gasin l1gasout l2gasout l3gasout$ b34srun$ Example 2: Reading Cross Product file /$ Tests reading of cross products b34sexec options open('crossp.dat') unit(35) form=formatted disp=unknown$ b34seend$ b34sexec data readcross head noob=293 nvar=10 unit=35$ b34srun$ b34sexec regression$ model gasout=gasin l1gasin l2gasin l3gasin l1gasout l2gasout l3gasout$ b34srun$ b34sexec regression$ model gasout=gasin $ b34srun$ DATA sentence parameters. LIST = key If key = FIRST5 will list first 5 observations. This is the default. If key = RAW will list original data. If key = TRANS will list transformed data. If key = BOTH both RAW and TRANS options are in effect. Note: This option requires HEAD be set. It is useful if there is data readin problem since series are listed as they are read. UNIT = n1 Sets input unit. If specified must be 10 or some value above 30 unless reading previous current data on unit 8 in DP unformatted. If the UNIT option is not specified, data must be input using DATACARDS$ or PGMCARDS$ sentences. See below examples and input discussion. In the place of UNIT, the keyword FILE(' ') can be used. If FILE is used, UNIT will default to 10 unless set to another unit. The file will stay open until closed with the statement b34sexec options close(10)$ b34srun$ FILE(' ') Sets B34S data file input. In place of FILE(' ') the user can open the file with open statement. The following two jobs are the same: The string ' ' can be up to 72 in length. b34sexec options open('mydata') unit=10 disp=old$ b34seend$ b34sexec data unit(10)$ input x y z$ b34seend$ b34sexec data file('mydata')$ input x y z$ b34seend$ Note: If the FILE paramater is used in a multistep dataset where the file to be read is built in a prior step IT IS IMPERATIVE that B34SRUN be used in place of B34SEEND in the prior step or ALLRUN be placed in the autoexec.b34 file on the PC. If this is not done, the DATA step will not find the file. NOOB = n2 Sets number of observations. If omitted, defaults to the number of observations in file. If READCROSS option is set, NOOB must be set. If DATACARDS$ or PARMCARDS$ is set and NOOB is not set, B34S defaults to using FILEF=@@ and checking each card for B34SRETURN$. This slows reading of the data but in recent years is the most widely used reading option since the missing codes NA, na, NaN and the SAS code . are seen as missing data. If NOOB is set FILEF=FREE is used which is substantially faster since it uses Fortran free format reading. Hence to improve performance, it is recommended that NOOB be set when =@@ reading is not needed. In a situation where NOOB is set and a CHARACTER sentence is present, B34S will automatically set FILEF=@@ if otherwise FILEF=FREE would have been used. The fastest reading option is FILEF=FIXED which was used for many years. This allows "jumping" over variables not needed or in the wrong format. NOOB should be set. Warning: If NOOB is set and filef is not set to @@ then it is not possible to process missing data codes or have more than one observation on the line. FILEF = key If key = FREE, data will be read using IBM free format routines. NOOB not needed to be set If key = DP, data will be read using unformatted double precision. If key = FIXED, the FORMAT sentence must be used to specify the format. If key = CFIXED the INPUT sentence must be modified to show the location of the data. Example: input x(1,2,3) y(1 4,4); If key = @@ more than one observation can be on the line. The LRECL of the file must be LE 512. This input convention replaces the SAS missing variable code of . with blanks on either side with the B34S missing value. The missing value codes NA, na and Nan are also supported. If this is used, the missing value may have to be recoded by a statement such as GEN IF(MYDATA .EQ. MISSING())MYDATA=-1.0 $ so that overflows do not occur. If it is desired to drop these observations then the statement GEN X=DIFMISSING(MYDATA)$ should be used. For further information on missing data, see section 1.22. The key word DIFMISSING stands for delete if missing Note: FREE is the default for FILEF if NOOB NE 0. @@ is the default if NOOB has not been set. FILEF=@@ can read data in form 1.2-3.2 as 1.2 and -3.2. If FILEF=FREE spaces or commas must separate the numbers since Fortran rules are in effect. If key = DMF then data will be read from the B34S DMF (Data management file) allocated to UNIT = n using the unformatted convention. If key = FDMF then data will be read from the B34S formatted DMF (Data management file) allocated to UNIT = n using the formatted convention. Note: B34S version 7.12g and beyond recognize the appropriate file format so FILEF=FDMF does not have to be explicitly set for formatted files. In cases where the operating system might not be able to detect the format of a file, DMF and FDMF can be used. Note: FILEF=DMF or FILEF=FDMF requires that the INPUT sentence is used if there are more series in the DMF file than 98 and the user wants to read series other than the first 98. More detail on DMF files is contained in section 40.0. For examples see section 13.7. DMFMEMBER(k) Sets DMF member name. If this parameter is omitted the first member is read. IBEGIN=n Sets first observation to read from DMF file. If this parameter is omitted first observation is read. IEND=nn Sets last observation to read from DMF file. If this parameter is omitted, last observation is read. PROBNUM = n3 Sets problem number. If omitted, numbers problems sequentially. HEADING =(' ') Sets optional heading. Max of 32 characters. DEBUG = n4 Set debug value in range 0-9. This option is only useful for software developers. WEIGHT = key Sets weight variable. Variable specified by key must have been mentioned in INPUT or BUILD sentence. This option does not change data stored on unit 8. Assume Z = weight variable, Y = new variable and X = old variable. Transformation used is: Y = X * (Z ** .5) * ( (1 / mean Z) ** .5 ) MAXLAG = n5 Sets max lag so that the first n5 observations will not be used in subsequent analysis. MAXLAG must be specified if LAG or LP operators are used. It is imperative that MAXLAG be set correctly. The range for MAXLAG is 0-99. NVAR=n6 Sets number of variables. Needed if READCROSS option is set or if BANK sentence is used. In other situations NVAR is indirectly picked up from INPUT and BUILD sentences. Date setting options. For further detail, see section 1.19 and the OPTIONS paragraph. SETFREQ(R) - Sets base frequency. 1. = annual data. .1 = data once per decade. R can be set as real OR integer. SETYEAR(NN) - Sets base year for annual data. Frequency assumed =1. SETMY(M1,Y1) - Sets base year for monthly data. Frequency assumed =12. SETQY(Q1,Y1) - Sets base year for quarterly data. Frequency assumed = 4. SETDMY(D1,M1,Y1) Sets base year for daily data. Frequency assumed =365. IDVAR = xx Sets the character variable xx as an id variable for the observation. IDDATE= xx Sets the variable xx as a julian date variable to identify each observation. IDDATETIME=xx Sets the variable xx as a julian date variable and indicates that time info is saved. RMISSING( ) Sets data value that will be automatically coded into B34S missing value code. For other missing value options, see section 1.22. INPUT sentence. INPUT X1 X2 X3 $ A Maximum of 99 series can be inputted. B34S variable names are 1-8 characters and follow SAS variable naming conventions. If the BUILD sentence is used, the total number of series in B34S must be LE 69. A CONSTANT variable is automatically added to the B34S data set unless the NOCONSTANT option was specified. B34S has a limited ability to process CHARACTER data in release 6.23 and above. If any of the variables listed on the INPUT sentence are CHARACTER type, they must be listed in the CHARACTER sentence which is discussed next. For further detail on CHARACTER variables, see the details discussed in section 13.1 "Processing Character Data in B34S." If FILEF=CFIXED then the INPUT sentence must be modified INPUT X1(icard,icolstart,icolend) X2( ) $ For example is X1 is on card # 1 in col 5-9 , X2 is in card # 1 in col 23-30 and X3 is on card # 2 in col 6-12 the INPUT sentebnce would be: INPUT X1(1,5,9) X2(1,23,30),X3(2,6,12)$ CHARACTER sentence. CHARACTER Xn1 Xn3$ Variables containing character data must be listed on a CHARACTER sentence. For further detail on processing of character data, see section 13.8. The CHARACTER sentence need not pass any variable names. BUILD sentence. BUILD Xk1 Xk2 $ The BUILD sentence allocates additional B34S variable names which are built with GEN sentences. See below for form of GEN sentences. FORMAT sentence. FORMAT=('(4E16.8, 2X,18F2.0)') $ The FORMAT sentence is required if FILEF = FIXED. A maximum of 480 characters is allowed. If the FORMAT sentence extends over more than one card, be sure to stop in col 72 on each card. LABEL sentence. LABEL X1='MORE INFO ON VARIABLE X1' X2='MORE INFO ON VARIABLE X2' $ The LABEL sentence allows the user to provide more information on a variable that is allowed by the 8 character name. Up to 40 columns of text can be provided inside the ' '. The LABEL sentence is optional. RENAME sentence. RENAME oldname = newname $ The RENAME sentence is used to replace a name in a dateset with a new name. It is usually used in situations where the SET statement is used. Assume X1 is in the original dataset. The statements b34sexec data set $ build y z$ gen y=x2*x2$ gen z=x2**2$ rename x1=newx1$ gen newx1=y/z$ allow reusing the location X1. A rename without more GEN statements leaves the old values in place. RENAME statements are executed after INPUT & BUILD statements. Once a RENAME statement has been found, the old name is not available. COMMENT sentence. COMMENT=('Any text here') $ Any number of comment cards can be used. These will be printed by the B34S DATA step. A max of 78 characters is allowed. DATACARDS, PGMCARDS and B34SRETURN sentences. If the UNIT parameter is not specified to point B34S to a data unit for input, the DATACARDS$ or PGMCARDS sentences are used to input data. The former sentence will not list datacards on log, while the latter will. The below listed example shows loading of three series and building another series which is the sum of the second and third series. b34sexec data noob=5$ input x y z$ build ypz$ gen ypz = y + z$ datacards$ 11 22 33 44 88 99 35 11 19 23 32 11 14 24 36 b34sreturn$ b34seend$ b34sexec rr$ model ypz = x$ b34seend$ Finally a regression is run. Note that free format is used. Since NOOB was specified, FILEF=FREE was implicitly used. The command PGMCARDS$ could have been substituted for DATACARDS$ command. The DATACARDS and PGMCARDS sentences read col 1-80 by default. If line numbers are present and a free format read is used which reads more variables than fit on one line, problems will occure. The CARD72 option on the DATACARDS and PGMCARDS sentences in the DATA paragraph will remove line numbers. The CARD72 command has not been used below because it is not needed. 13.1 Processing Character Data Since version 6.23 B34S has had a limited ability to process character data in the B34S DATA command. The B34S MATRIX command has substantially more capability in this area since more data types are supported and a number of facilities are available. B34S character variables are limited to a maximum of 8 characters. Character data must be read with FILEF=FIXED or FILEF=@@. Character data can only be listed with the B34S LIST command. If a character string is greater than eight characters and is read with FILEF=@@, only the first 8 characters will be used. All character variables must be listed on the CHARACTER sentence. If character data is passed to a procedure other than LIST or to a GEN statement in the DATA step, unpredictable results can occur unless the GEN function can process character data. Additional character capability may be added in future B34S versions. B34S CHARACTER data cannot be saved in FSV files, nor can character data be explicitly saved in files with the SCAINPUT command. The reason for this restrictions is that SCA and RATS do not support these data types. Data sets containing character data can be modified using the SET option of the DATA command. Some of these restrictions may change in future releases. A few examples of character data are given below. b34sexec data noob=3 filef=@@$ input x y$ character x$ datacards$ aa 111 bb 222 cc 333 b34sreturn$ b34seend$ b34sexec list$ b34seend$ b34sexec data noob=3 filef=fixed$ input x y$ character x$ format('(a2, f4.0)')$ datacards$ aa 111 bb 222 cc 333 b34sreturn$ b34seend$ b34sexec list$ b34seend$ b34sexec data$ input d m y$ character d$ datacards$ 12a 12 86 12b.b 12 1986 c12 12 1902 _12c 12 1886 12.g 12 1786 1aa 1 2002 b34sreturn$ b34seend$ b34sexec list$ b34seend$ GEN statements for character data are limited to copy and logical operators. If character data is placed in other GEN statements, unpredictable results may occur. The example given below illustrates what is possible. b34sexec data $ input x y $ build xx test test1 test2 yy $ character x y yy xx$ gen yy='a'$ gen if(x.eq.yy) test=100$ gen if(x.eq.'a')test1=100$ gen if('a'.eq.x)test2=100$ gen xx='abcd'$ datacards$ a b c c e f b34sreturn$ b34seend$ b34sexec list$ b34seend$ In all logical operations involving .eq. , the result is NOT case sensitive. Dates can be converted to character representation using the GEN functions CHARDATE and CHARDATEMY. Extensive character processing can be doien using the MATRIX command. The below listed job illustrates passing data to this command: b34sexec data noob=3 filef=@@$ input x y$ character x$ datacards$ aa 111 bb 222 cc 333 b34sreturn$ b34seend$ b34sexec list$ b34seend$ b34sexec matrix; call loaddata; call names(all); call tabulate(x,y); b34srun; Edited Output produced: B34S Matrix Command. Version December 2002. Date of Run d/m/y 19/ 4/03. Time of Run h:m:s 12:16:38. => CALL LOADDATA$ => CALL NAMES$ # Name Type Klass Row-Col Label 1 X Char*8 D1array 3 by 1 2 Y Real*8 D1array 3 by 1 3 CONSTANT Real*8 D1array 3 by 1 Space available 7869961 , used 122 , peak used 122 # Temp varibles 1 , peak # used 4 => CALL TABULATE(X,Y)$ Obs X Y 1 aa 111.0 2 bb 222.0 3 cc 333.0 B34S Matrix Command Ending. Last Command reached. Space available in allocator 7869961, peak space used 122 Number variables used 4, peak number used 4 Number temp variables used 1, # user temp clean 0 13.2 Data Building Options - GEN sentence GEN sentence. GEN XNEW = FUNCTION(arg1,arg2, .... ) $ GEN XNEW = analytic statement here $ The GEN sentence allows building of variables. If variable names are used, they must have been mentioned on prior INPUT or BUILD sentences. The same variable can be used over and over again as a temporary variable if desired. The GEN sentence only works for data loaded with INPUT or BUILD commands since the BANK command does not know variable names. GEN sentence FUNCTIONS. In all cases XNEW is the variable built, REALN = any real number (a max of 8 digits, including the decimal point, is allowed.) INUM = integer. Some functions use character input CHARVAR. It is important that argument sequences be followed exactly. All variables are initialized to 0.0 by the BUILD sentence. Examples of GEN sentence fuction. gen x=missing(); gen y=sqrt(z); Examples of GEN sentence Analytic Statements gen x=x*x$ gen y=(x+2.)/(j + kkk)$ gen vv = xx - jj + kk $ gen x =(x-lag(x))/lag(x) $ gen x =sin(y*q)/2.0 $ gen test5=log(exp(log(exp(5.0))))$ IF-THEN statements are allowed with the following operators. .EQ. .NE. .LT. .LE. .GE. .GT. .AND. .OR. Note that .GE. is recognized but . GE . is not. gen if ( )_______$ must have _______ as THEN or a valid analytic statement without the GEN. Examples of IT-THEN statements. gen if(x .eq. 2.0)y=x*x$ gen if(x.eq.2.0)y=x*x$ gen if(x.ne.y)then$ gen x=x**2$ gen q=q/x$ gen endif$ More complex statements can be used gen if(x .ne. y .and. z .lt. v)then$ gen x=x**2$ gen q=q/x$ gen endif$ Valid analytic expressions can be used inside the IF-THEN construction. For example gen if(x**2 .ne. y .and. sin(z) .lt. v)then$ gen x=x**2$ gen q=q/x$ gen xx=log10(2.0*exp(x))/dsqrt(log(x))$ gen endif$ The statement gen if(x .eq. 2.0)y=x*x$ is allowed but the statement gen if(x .eq. 2.0)y=boxcox(x,2.0)$ is not allowed since BOXCOX is not supported as part of an analytic statement. If the logic of the above statement is wanted the correct form would be gen if(x .eq. 2.0)then$ gen y=boxcox(x,2.0)$ gen endif$ B34S Functions allowed as part of analytic statements. More data building capability is provided in the MATRIX Command. See more detailed help below. Function Example of use Task ABS gen y=abs(x); y=|x| ASIN gen y=asin(x); Arc sin BETAPROB gen y=betaprob(x1,x2,x3); Beta probability CDAY gen day=cday(); Character form of Day CHARDATE gen cd1=chardate(julian); Returns dd\mm\yy CHARDATEMY gen cd2=chardatemy(julian); Returns mm\yyyy CHARTIME gen ctime=chartime(julian); Returns hh:mm:ss CHISQPROB gen csprob=chisqprob(x1,x2); Chisq prob of x1 with DF x2 .5 le x2 le 2000 CHTOREAL gen nreal=chtoreal(xchar); Converts Character to real CJULDAY gen juldate=cjulday(); Julian for that obs. CMONTH gen month=cmonth(); Months for that Obs. COS gen y=cos(x); Cosine of x in y. CQT gen quarter=cqt(); Quarter for that obs. CYEAR gen year=cyear(); Year for that obs. DABS gen y=dabs(x); y=|x|. DARCOS gen y=darcos(x); y= acos(x) DASIN gen y=dasin(x); y=asin(x) DATAN gen y=datan(x); y=tan(x); DATAN2 gen y=datan2(x1,x2); y=datan2(x1,x2); DCOS gen y=dcos(x); Cosine of x in y. DCOSH gen y=dcosh(x); Hyperbolic cosine x in y. DELOBS gen y=delobs(); Deletes observation. DERF gen y=derf(x); See Fortran Manual. DERFC gen y=derfc(x); See Fortran Manual. DEXP gen y=dexp(x); y=e**x DGAMMA gen y=damma(x); Integral from 0 to inf of u**(x-1)*e**(-u)du DIFMISSING gen y=difmissing(x); Deletes obs if x=missing DINT gen y=dint(x); Integer part of x. DLGAMA gen y=dlgama(x); Log gamma function. DLOG gen y=dlog(x); Natural log x in y. DLOG10 gen y=dlog10(x); Log base 10 x in y. DMAX1 gen y=dmax1(x1,x2); y = max of x1 x2 DMIN1 gen y=dmin1(x1,x2); y = min of x1 x2 DMOD gen y=dmod(x1,x2); y = remainder of x1/x2. DSIN gen y=dsin(x); y = sin of x. DSINH gen y=dsinh(x); y = hyperblic sin of x. DSQRT gen y=dsqrt(x); y = x**.5 DTAN gen y=dtan(x); y = tan of x DTANH gen y=dtanh(x); y = hyperbolic tan of x EXP gen y=exp(x); y = e**x EXTRACT gen cy=extract(charvar,i,j)$ chxnew=charvar(i:j) i, j must be in range 1-8 FDAYHMS gen y=fdayhms(h,m,s); Sets fraction of day given hour, minute, second FIND gen y=find(charvar,' '); finds location blank in charvar FPROB gen y=fprob(fval,df1,df2); Probability of F(df1,df2) FYEAR gen y=fyear(juldate); Fraction of a year. GETDAY gen y=getday(juldate); Day of year. GETHOUR gen y=gethour(juldate); Hour of day. GETMINUTE gen y=getminute(juldat); Minute of day. GETMONTH gen y=getmonth(julday); Month of year. GETQT gen y=getqt(julday); Quarter of year. GETSECOND gen y=getsecond(julday); Second of day. GETYEAR gen y=getyear(julday); Year. IKOUNT gen y=ikount(); # of obs read. INT gen y=int(x); y set to integer part of x. INVBETA gen y=invbeta(x1,x2,x3); Inverse of Beta distribution. x1 is probability INVCHISQ gen y=invchisq(x1,x2); Inverse chi-squared. 0 le x1 le 1.0 .5 le x2 le 2,000,000 INVFDIS gen y=invfdis(x1,x2,x3); x1 = probability x2 and x3 DF INVTDIS gen y=invtdis(x1,x2); x1 = probability x2 = DF JULDAYDMY gen y=juldaydmy(day,m,year); gets julday JULDAYQY gen y=juldayqy(qt,year); gets julday JULDAYY gen y=juldayy(year); gets julday KOUNT gen y=kount(); gets observation number LAG gen y=lag(x); y(t)=x(t-1) LAGn gen y=lagn(x); y(t)=x(t-n) LOG gen y=log(x) y=natural_log(x) LOG10 gen y=log10(x); y=log10(x) MAKEINT gen y=makeint(x); y=integer part of x MISSING gen y=missing(); y set to missing MOVELEFT gen chnew=moveleft(chold,n); chold moved left n MOVERIGHT gen chnew=moveright(chold,n); chold moved right n NCCHISQ gen y=ncchisq(x1,x2,x3); Non central chi-square x1 variable GE 0 for which to calculate probability x2 degress of freedom (GE .5) x3 non centratity .5 LE (x2+x3) LE 200000 NORMDEN gen y=normden(z); y= density of normal distribution NOT gen y=not(x); x=1.0 => y=0.0 x=0.0 => y=1.0 NOTFIND gen y=notfind(chold,' '); y = first nonblank PLACE gen chnew=place(ch,i,j); chnew(I:I+J-1)=ch(1:J-I+1) i, j must be in range 1-8 PROBIT gen y=probit(p); inverse normal dsitribution. p = probability. PROBNORM gen y=probnorm(z); normal distribution probability REALTOCH gen chav=realtoch(real); Convert real*8 to Ch*8 REC gen y=rec(); Variable from rectangular distribution RECCS gen y=reccs(); Generates random rectangular number. Uses common seed. This is only appropriate if the option RECVER(RAND) is in effect. RN gen y=rn(); y = random normal deviate SIN gen y=dsin(x); y = sin of x. SQRT gen y=dsqrt(x); y = x**.5 TIMESPI gen y=timespi(x); y = x*pi TPROB gen y=tprob(tval,df); y = probability of tval given DF df. Detailed discussion and examples of data building functions. Note not all of these functions are allowed in an analytic statement. Command Description Old B34S TG # GEN XNEW = SQRT(XOLD) $ Square root 1 GEN XNEW = LOG(XOLD) $ Natural Log 2 GEN XNEW = BOXCOX(XOLD,REALN) $ XNEW=((XOLD**REALN)-1.)/REALN 2 GEN XNEW = LOG10(XOLD) $ Log to base 10 3 GEN XNEW = EXP(XOLD) $ XNEW = e ** XOLD 4 GEN XNEW = POWER(XOLD1,XOLD2) $ XNEW = XOLD1 ** XOLD2 5 GEN XNEW = POWER(REALN,XOLD) $ XNEW = REALN ** XOLD 6 GEN XNEW = POWER(XOLD,REALN) $ XNEW = XOLD ** REALN 10 GEN XNEW = INV(XOLD) $ XNEW = 1.0 / XOLD 7 GEN XNEW = ADD(XOLD,REALN) $ XNEW = XOLD + REALN 8 GEN XNEW = ADD(XOLD1,XOLD2) $ XNEW = XOLD1 + XOLD2 11 GEN XNEW = MULT(XOLD,REALN) $ XNEW = XOLD * REALN 9 GEN XNEW = MULT(XOLD1,XOLD2) $ XNEW = XOLD1 * XOLD2 13 GEN XNEW = SUB(XOLD1,XOLD2) $ XNEW = XOLD1 - XOLD2 12 GEN XNEW = DIV(XOLD1,XOLD2) $ XNEW = XOLD1 / XOLD2 14 GEN XNEW = GE(XOLD,REALN) $ XNEW = 1 if XOLD GE REALN 15 GEN XNEW = GE(XOLD1,XOLD2) $ XNEW = 1 if XOLD1 GE XOLD2 16 GEN XNEW = ASIN(XOLD) $ XNEW = ASIN(XOLD) 17 GEN DEL(XOLD) $ If XOLD = 0, observation dropped 18 GEN DEL(XOLD,REALN) $ If XOLD = REALN, obs dropped 19 GEN XNEW = SIN(XOLD) $ XNEW = SIN(XOLD) 20 GEN XNEW = COS(XOLD) $ XNEW = COS(XOLD) 21 GEN XNEW = LAG(XOLD) $ XNEW = lag of XOLD 22 GEN XNEW = LAG01(XOLD) $ XNEW = lag of XOLD 22 GEN XNEW = LAG1(XOLD) $ XNEW = lag of XOLD 22 GEN XNEW = LAG6(XOLD) $ XNEW = lag 6 OF XOLD 22 GEN GOTOTGN(XNEW,ITGN) $ Goes to TG NUMBER ITGN if XNEW=1.0 23 Number set as GEN CONTINUE( ITGN) $ GEN CONTINUE( ) $ Label inside ( ) 24 GEN DOWN(XOLD,II) $ Jump down II TG spaces if XOLD=1.0 25 GEN UP(XOLD,II) $ Jump up II TG spaces if XOLD=1.0 26 GEN XNEW = GT(XOLD1,XOLD2) $ If XOLD1 GT XOLD2 then XNEW = 1.0 27 GEN XNEW = GT(XOLD1,REALN) $ If XOLD1 GT REALN then XNEW = 1.0 28 GEN XNEW = LT(XOLD1,XOLD2) $ If XOLD1 LT XOLD2 then XNEW = 1.0 29 GEN XNEW = LT(XOLD1,REALN) $ If XOLD1 LT REALN then XNEW = 1.0 30 GEN XNEW = LE(XOLD1,XOLD2) $ If XOLD1 LE XOLD2 then XNEW = 1.0 31 GEN XNEW = LE(XOLD1,REALN) $ If XOLD1 LE REALN then XNEW = 1.0 32 GEN XNEW = NE(XOLD1,XOLD2) $ If XOLD1 NE XOLD2 then XNEW = 1.0 33 GEN XNEW = NE(XOLD1,REALN) $ If XOLD1 NE REALN then XNEW = 1.0 34 GEN READ( ) $ Read a new observation at once 35 GEN XNEW = COPYV(XOLD1) $ XNEW = XOLD1 36 GEN XNEW = COPYV(REALN) $ XNEW = REALN 37 GEN GOTO( ) $ Go to GEN label ( ) at once 38 GEN XNEW = AND(XOLD1,XOLD2) $ If XOLD1=XOLD2=1.0 then XNEW = 1.0 39 GEN XNEW = OR(XOLD1,XOLD2) $ If XOLD1 or XOLD2 =1.0 XNEW = 1.0 40 GEN XNEW = DABS(XOLD) $ XNEW = ABS(XOLD) 41 GEN XNEW = NOT(XOLD) $ If XOLD=1.0 => XNEW=0.0, 42 If XOLD=0.0 => XNEW=1.0. If XOLD is NE 0.0 & NE 1.0, XNEW = 1.D+32 GEN XNEW = DINT(XOLD) $ XNEW = integer part of XOLD 43 GEN XNEW= DELOBS() $ Deletes observation 44 GEN XNEW= DIFMISSING(XOLD) $ If XOLD= missing, obs is dropped 45 GEN XNEW = EQ(XOLD,REALN) $ If XOLD EQ REALN, XNEW = 1 46 GEN XNEW = EQ(XOLD1,XOLD2) $ If XOLD1 = XOLD2, XNEW = 1 47 GEN XNEW = KOUNT() $ XNEW = observation number 48 GEN XNEW = IKOUNT() $ XNEW = number of observation read 49 GEN KOUNTDEL(REALN) $ If KOUNT= REALN, obs deleted 50 GEN IKOUNTDEL(REALN) $ If IKOUNT = REALN, obs deleted 51 GEN BACKSPACE() $ If KOUNT NE NOOB, backspace 5 52 GEN XNEW = REC() $ Generates rectangular number 53 GEN XNEW = RECCS() $ Generates rectangular number 54 uses common seed. This is only appropriate if the option RECVER(RAND) is in effect GEN XNEW = RN() $ Generates random normal number 57 having mean = 0.0 and SD = 1. GEN XNEW= BUILDS(REALN1,REALN2)$ Builds a seasonal in XNEW from 1 - REALN2, starting in REALN1. Maximum value for REALN1 = 99. 59 GEN XNEW = DTAN(XOLD) $ XOLD LE (2**50) * pi 60 GEN XNEW = DARCOS(XOLD) $ ABS(XOLD) LE 1.0 61 GEN XNEW = DATAN(XOLD) $ XOLD = any real number 62 GEN XNEW = DATAN2(XOLD1,XOLD2) $ XOLD1, XOLD2 any real number not 0 63 GEN XNEW = DSINH(XOLD) $ XOLD LE 175.366, if X=XOLD 64 XNEW=(e**X - e**-x)/2 GEN XNEW = DCOSH(XOLD) $ XOLD LT 175.366, if X=XOLD 65 XNEW=(e**x - e**-x)/2 GEN XNEW = DTANH(XOLD) $ XOLD = any real number 66 GEN XNEW = DGAMMA(XOLD) $ 2**(-252) LE XOLD LE 2**(252) 67 integral from 0 to inf of u**(XOLD-1)*e**(-u)du GEN XNEW = DLGAMA(XOLD) $ log gamma function 68 GEN XNEW = DERF(XOLD) $ see FORTRAN manual 69 GEN XNEW = DERFC(XOLD) $ see FORTRAN manual 70 GEN XNEW = TIMESPI(XOLD) $ XNEW = XOLD * pi 71 GEN XNEW = PROBNORM(XOLD) $ XNEW = probability of normal 72 distribution. Can be calculated as .5+.5*DERF(XOLD/SQRT(2.0) Note: The density of the normal distribution is calculated by GEN DEN=DEXP(-1.0*(Z*Z)/2.0)/(DSQRT(TIMESPI(2.0)))$ or by use of the NORMDEN function. GEN XNEW = DMAX1(XOLD1,XOLD2) $ XNEW = max of XOLD1 XOLD2 73 GEN XNEW = DMIN1(XOLD1,XOLD2) $ XNEW = min XOLD1 XOLD2 74 GEN XNEW = DMOD(XOLD1,XOLD2) $ XNEW = remainder of XOLD1/XOLD2 75 GEN XNEW = MISSING() $ XNEW = missing value 88 GEN XNEW = PROBIT(XOLD) $ XNEW = inverse normal of XOLD. 89 XOLD must be in range (0.0 1.0) Implicit Date capability. Assuming the user has set a base date on the OPTIONS card or in the DATA paragraph using the functions SETYEAR, SETMY or SETDMY and SETFREQ, the commands: GEN JULDATE=CJULDAY()$ 76 GEN DAY=CDAY()$ 77 GEN MONTH=CMONTH()$ 78 GEN YEAR=CYEAR()$ 79 GEN QUARTER=CQT()$ 80 obtain the relative julian date information for that observation. Explicit Date Capability. Given data for the arguments, the date can be manipulated with the commands GEN JULDATE=JULDAYDMY(DAY,MONTH,YEAR)$ 81 GEN JULDATE=JULDAYQY(QUARTER,YEAR)$ 82 GEN JULDATE=JULDAYY(YEAR)$ 83 GEN DAY=GETDAY(JULDATE)$ 84 GEN MONTH=GETMONTH(JULDATE)$ 85 GEN YEAR=GETYEAR(JULDATE)$ 86 GEN QUARTER=GETQT(JULDATE)$ 87 GEN XNEW=FYEAR(JULDATE)$ Gets fraction of a year such as 1958.5 95 GEN XNEW=GETHOUR(JULDATE)$ 96 GEN XNEW=GETSECOND(JULDATE)$ 97 GEN XNEW=GETMINUTE(JULDATE)$ 98 GEN XNEW=FDAYHMS(HOUR,MINUTE,SECOND)$ Sets fraction of a day 111 The commands GETHOUR, GETMINUTE, GETSECOND and FDAYHMS truncate HOUR, MINUTE and SECOND to integer values in the ranges (0-24), (0-60) and (0-60) respectively. B34S saves dates as integers. 12:00 noon on 31/9/1942 would be set as GEN JULDATE=JULDAYDMY(31,9,1942)+.5 $ This could also be done as GEN JULDATE = JULDAYDMY(31,9,1942) + FDAYHMS(12,0,0) $ The following commands can manipulate character variables. GEN XNEW=CHTOREAL(CHAR)$ Converts a character number to a real #. 99 The function must be used with caution since char must be a number. GEN CHAR=REALTOCH(REAL)$ Converts a real number to character #. 108 The function must be used with caution since REAL must fit inside 8 characters. GEN CHXNEW=EXTRACT(CHARVAR,I,J)$ CHXNEW=CHARVAR(I:J) 100 I, J must be in range 1-8 GEN CHXNEW=PLACE(CHARVAR,I,J)$ CHXNEW(I:I+J-1)=CHARVAR(1:J-I+1) 101 I, J must be in range 1-8 GEN CHXNEW=MOVERIGHT(CHARVAR,N)$ CHARVAR moved right N 102 N must be LE 8 GEN CHXNEW=MOVELEFT(CHARVAR,N)$ CHARVAR moved left N 103 GEN CHXNEW=CHARDATE(JULIAN)$ Produces dd\mm\yy 104 GEN CHXNEW=CHARDATEMY(JULIAN)$ Produces mm\yyyy 122 GEN XNEW=IWEEK(julian); Produced 1=Monday etc 123 GEN CHWNEW=CWEEK(JULIAN); Produces 'Monday' etc 124 GEN CHXNEW=CHARTIME(JULIAN)$ Produces hh:mm:ss 105 GEN XNEW =FIND(CHARVAR,' ')$ Finds location of ' ' 106 GEN XNEW =NOTFIND(CHARVAR,' ')$ Location where ' ' not found 107 GEN CHXNEW=MASKADD(CHVAR1,CHVAR2)$ Places characters in CHXNEW only 109 if that col is blank in CHVAR1 and there is a character in that col in CHVAR2. GEN CHXNEW=MASKSUB(CHVAR1,CHVAR2)$ Puts blanks in CHXNEW only where 110 non blank characters are in CHVAR2. The following example illustrates EXTRACT, PLACE, MASKADD, MASKSUB. /$ Tests Character routines b34sexec data noob=4 filef=@@$ input ch$ build ch2 ch3 ch4 ch3p4 chm2 find1 findnblk$ character ch ch2 ch3 ch4 ch3p4 chm2$ gen ch2 =extract(ch,3,4)$ gen ch3 =place(extract(ch,7,8),1,2)$ gen ch4 =place(ch,7,8)$ gen ch3p4 =maskadd(ch3,ch4)$ gen chm2 =masksub(ch,ch2)$ gen find1 =find(ch,'1')$ gen findnblk=notfind(ch,' ')$ datacards$ abcdefgh 12345678 87654321 hgfecba b34sreturn$ b34seend$ b34sexec list$ var ch ch2 ch3 ch4$ b34seend$ b34sexec list$ var ch3p4 chm2 find1 findnblk$ b34seend$ Output from running this example follows: Listing for observation 1 to observation 4. OBS CH CH2 CH3 CH4 1 abcdefgh cd gh ab 2 12345678 34 78 12 3 87654321 65 21 87 4 hgfecba fe a hg Listing for observation 1 to observation 4. OBS CH3P4 CHM2 FIND1 FINDNBLK 1 gh ab cdefgh 0.00000000 1.0000000 2 78 12 345678 1.0000000 1.0000000 3 21 87 654321 8.0000000 1.0000000 4 a hg fecba 0.00000000 1.0000000 GEN DCALLkey() $ dynamically call TG routine 90-94 key must be DRANV, DRANW, DRANX, DRANY, DRANZ The maximum LAG__ is 99. The command GEN PASS(' ') passes text to the B34S low level command parser without parsing. To assist FORTRAN 77 users, the following aliases are provided: DSQRT=SQRT, DLOG=LOG, DLOG10=LOG10, DEXP=EXP, DASIN=ASIN, DCOS=COS, DABS=ABS, SIN=DSIN, ABS=DABS. Statistical functions in B34S. GEN XNEW = PROBIT(XOLD) $ XNEW = inverse normal of XOLD. 89 XOLD must be in range (0.0 1.0) GEN XNEW = PROBNORM(XOLD) $ XNEW = probability of normal 72 distribution. Can be calculated as .5+.5*DERF(XOLD/SQRT(2.0) GEN XNEW = NORMDEN(XOLD) $ XNEW = densitity of normal distribution. XNEW = exp(-1.*z*z/2)/dsqrt(2*pi) GEN XNEW = BETAPROB(xold1,xold2,xold3) $ Computes probability XNEW 113 that a variable having a beta distribution having parameters xold2 and xold3 is LE xold1. xold2 and xold3 must be GT 0 GEN XNEW = INVBETA(xold1,xold2,xold3) $ Inverse of beta distribution 114 xold1 is probability. GEN XNEW = CHISQPROB(xold1,xold2) $ xnew is probability that xold1 115 having chi-squared distribution with degress of freedom xold2 is le xold1. .5 LE xold2 200000 xold1 ge 0.0 GEN XNEW = INVCHISQ(xold1,xold2) $ Inverse chi-squared. 116 0 le xold1 le 1.0. .5 le xold2 200000 GEN XNEW = NCCHISQ(xold1,xold2,xold3)$ Non central chi-square 121 xold1 variable GE 0 for which to calculate probability xold2 degress of freedom (GE .5) xold3 non centratity .5 LE (xold2+xold3) LE 200000 GEN XNEW = FPROB(xold1,xold2,xold3) $ F distribution probability 117 xold1 = F value (GE 0) xold2 = df numerator (GT 0) xold3 = df denominator (GT 0) GEN XNEW = INVFDIS(xold1,xold2,xold3)$ Inverse F distribution 118 xold1 = probability in range 0 1) xold2 = df numerator (GT 0) xold3 = df denominator (GT 0) GEN XNEW = TPROB(xold1,xold2)$ Probability of t distribution 119 xold1 = t value xold2 = df (GT 0) if xold1 = 1.966 and xold2 = 10000 we get .95 GEN XNEW = INVTDIS(xold1,xold2)$ Inverse t distribution 120 xold1 = probability xold2 = df (GT 0) Note: At low levels of probability, the INVFDIS command may fail due to overflows. 13.3 Notes on Analytic statements In addition to the function driven GEN statements, B34S also allows analytic GEN statements. Since analytic GEN statements are expanded by the B34S parser, there is no longer a 1 to 1 correspondance between GEN statements and the underlying TG statements. Hence the GEN statements UP, BACKWARD, DOWN or FORWARD may not work as expected if any analytic statements are within the jump range. Analytic GEN statements parse slower but are easier to use. The operators +, -, *, / and ** are allowed. In addition IF-THEN structures are allowed. Examples of analytic statements are given below. All variables must have been mentioned on an INPUT or BUILD sentence. Statements are evaluated from left to right. Expansion first takes place inside ( ). A limited number of B34S functions are allowed inside the analytic statement. Function arguments themselves can be analytic statements. gen x=x*x$ gen y=(x+2.)/(j + kkk)$ gen vv = xx - jj + kk $ gen x =(x-lag(x))/lag(x) $ gen x =sin(y*q)/2.0 $ gen test5=log(exp(log(exp(5.0))))$ Note: Caution for users if CALMATH is in effect. Users are cautioned to make use of ( ) to be sure that an analytic statement is evaluated as desired. For example use x = ( y * (p + q))$ type statements. The order of B34S doing evalutions is to first make all constants temp variables of the form ######__ where __ goes from 1-40. Next any functions such as SIN are evaluated in place. Next the lowest ( ) is found and evaluated. After all ( ) are reduced, the expression is evaluated from left to right in a manner similar to a simple hand-held calculator. This convention differs from the FORTRAN convention that first evaluates **, then * and /, and finally + and -. Users are cautioned to carefully check the results of all analytic calculations by listing the data with the B34SEXEC LIST option (possible with IEND=10 if there are too many observations), before proceeding to the analysis. The user can inspect the STACK during expansion of a GEN sentence by placing the following line first in the control file b34sexec options debugsubs(pstack)$ b34seend$ Examples of statements where unintended results can occur are given next. Using the B34S CALMATH option, the expression gen a=17+16*3**2$ results in a=9801. Using ( ) the expression can be written as gen a9801=(((17+16)*3)**2)$ which will resolve to 9801 no matter whether CALMATH or FORTMATH is in effect. Note that with CALMATH in effect B34S evaluates the expression as if it was entered into a calculator from left to right. This is in contrast to FORTRAN which would evaluate the expression as if it were written gen a = 17+(16*(3**2))$ and obtain a=161. The expression gen b = 12-3*4/2$ using the FORTMATH convention would be seen as if it were written gen b = 12-((3*4)/2)$ and would find b=6 while B34S using the CALMATH option would evaluate gen b = (12-3)*4/2$ and would find b=18. A good rule to follow is use ( ) so that it is clear what is desired. While many "old style" programs follow the FORTRAN approach, the developer of B34S has found than many modern users who do not have a programing background obtain unintended results using the FORTRAN programing convention. As a result B34S was designed to allow the simpler "calculator" approach. Users have told the developer that it is best to force the FORTRAN convention on users. This suggestion has been taken. As of now so as not to confuse FORTRAN users, the FORTMATH option is the default for all GEN statements. This option forces the B34S GEN statement parser to resolve from left to right, first replacing FUNCTIONS, next replacing ( ) expressions, next replacing ** expressions, next replacing * and / expressions and finally replacing + and - expressions. The B34S MATRIX command follows the FORTRAN convention 100%. Just the discussion of the differences between the CALMATH and the FORTMATH option alerts users to how an expression is parsed. All power functions require a positive base. For example if X=-2.0 Y = X**Q is not allowed. The reason for this "harsh" limit is while -2**3 = -8, -2**3.00000000000001 enters the complex domain. If the user knows that the exponent will be an integer, the code gen y=abs(x)**3$ gen if(x.lt.0.0)y=-1*y$ or in the general case gen y=abs(x)**e$ gen if(x.lt.0.0.and.dint(e/2).ne.(e/2))y=-1*y$ will give the desired result. Multiple operations. GEN statements should resolve to be in the form of an assingment gen x = value; after an analytic statement such gen x = y*z; which is seen as " variable operator variable " A statement of the form gen x = -y+x; will not work as intended since it is seen as "operator variable operator variable" The correct form is gen x = -1. * y + x; For example the statements b34sexec options ginclude('gas.b34'); b34srun; b34sexec data set; build yy; gen yy=-gasout*gasin; b34srun; Produces the following error message after gasout*gasin was resolved to a temp variable: Note: FORTRAN calculation hierarchy used in GEN sentence. ERROR: Argument error to analytic expression. Check for multiple operators or = signs. Problem was with: = - ######01 Examples of GEN statements User wants to delete all observations where X=4.0, Y=10.0 and Z=6.0 gen if(x.eq.4.0.or.y.eq.10.0.or.z.eq.6.0)x=delobs()$ User wants to delete all observations were X is missing. gen x=difmissing(x)$ User wants to build lag variables. On DATA step set MAXLAG=n where n is the maximum lag. gen xlag1=lag(x)$ gen xlag9=lag9(x)$ If the user knows in advance that all X variables are positive, the below listed code will delete the appropriate number of observations since it makes use of the fact that all variables are initialized to zero by B34S and a lag before the start of the data will be set to zero. The user will be warned if this type of approach is used. The developer recommends that the MAXLAG approach be used since X may actually have zero values. gen xlag1=lag(x)$ gen xlag9=lag9(x)$ gen del(xlag9)$ The statement GEN LX = LAGn(X)$ places the nth lag of X in LX. B34S uses the current value of X at that point as does SAS. Usually this is not a problem. The below listed code illustrates what occurs. b34sexec data noob=10 maxlag=1 head $ build x lagx lagx2$ gen x=kount() $ gen lagx = lag(x)$ gen x=x+100$ gen lagx2= lag(x)$ b34srun$ b34sexec list$ b34srun $ Produces output: OBS X LAGX LAGX2 1 101.00000 0.00000000 100.00000 2 102.00000 1.0000000 101.00000 3 103.00000 2.0000000 102.00000 4 104.00000 3.0000000 103.00000 5 105.00000 4.0000000 104.00000 6 106.00000 5.0000000 105.00000 7 107.00000 6.0000000 106.00000 8 108.00000 7.0000000 107.00000 9 109.00000 8.0000000 108.00000 13.4 User Control of Data Building supported in MVA and CMS only. The GEN Function DCALL allows the user to customize B34S to perform specialized data building functions available in FORTRAN. DCALL allows a dynamic branch. All routines have the same argument list. Below is an example where variables 1 - 10 are divided by SQRT of variable 11. MVS SETUP /*jobparm r=4096,t=1 // exec fortvce //fort.sysin dd * subroutine dranv(dummy,ikount,kkount,kount,noob,obno,onnb,iflag, *inew,kode,ivar,bdex,mm,ilpmx,iotest,idbug) real*8 dummy(99),obno,onnb c do 100 i=1,10 100 dummy(i) = dummy(i) / dsqrt(dummy(11) return end //lked.syslib dd dsn=bec4346.#b34s.vsload,disp=shr,label=(,,,in) // dd dsn=sys1.vsfort.vlnkmlib.ver23,disp=shr // dd dsn=sys1.vsfort.vfortlib.ver23,disp=shr //lked.syslmod dd dsn=&&lib2(dranv),disp=(mod,pass), // space=(trk,(2,2,2)),unit=scratch // exec b34s //go.b34slib dd dsn=&&lib2,disp=(old,delete) //go.sysin dd * b34sexec data unit=10$ input x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11$ gen dcalldranv() $ b34seend$ CMS SETUP Step one: Compile DRANV FORTRAN. At UIC this is done with command: FORTVCE DRANV (OPT(3) Step two: Allocate file. At UIC this is done with command: FILEDEF B34SLIB DISK DRANV MODULE Step three: Run B34S the usual way with B34S CMS EXEC. Discussion of arguments DUMMY = array of size 99 containing current observation. IKOUNT = total number of observations read so far KKOUNT = IKOUNT - obs deleted by IFLAG KOUNT = KKOUNT - number deleted by MAXLAG NOOB = number of observations OBNO = REAL*8 version of NOOB ONNB = OBNO - 1.0 IFLAG = set = 0 to keep observation, set = 1 to delete IDBUG = set to 0 usually. Can be set by DEBUG At the present time the DCALL feature works only on CMS and MVS. For complex data transformations it is suggested that the data be moved to the matrix command for further processing and then the series moved back. Another possibility is to use SAS to build the data. For large cross section work, the SAS option is often used. For complex time series applications, the MATRIX command approach may be the way to proceed. 13.5 Random Number Generation and LP Capability The FUNCTION BACKSPACE allows B34S to generate NOOB random numbers by passing only one data card and building NOOB data points. For example: b34sexec data noob=200$ input rr $ build rn1 rn2 $ gen backspace()$ gen rr = rec() $ gen rn1 = rn() $ gen rn2 = rn() $ datacards$ 1111.0 b34sreturn$ b34seend$ will generate 200 observations where RR is random rectangular variable and RN1 and RN2 are random normal variables. In most cases, reading only one observation over and over does not gain the user anything. A better way to proceed is to use a job with only a BUILD sentence. This will run faster. An example is given below: b34sexec data noob=200$ build rr rn1 rn2 y$ gen rr = rec() $ gen rn1 = rn() $ gen rn2 = rn() $ gen y = 100.0 + (.5 * rr) + (.3 * rn1) + rn2$ b34seend$ In the above job, RR is a rectangularly distributed variable in the range 0.0 - 1.0, while RN1 and RN2 are random normal variables with mean 0.0 and sd 1.0. The default is to have the REC and RN commands set to call GGUBS and GGNML which were obtained from the 8th edition of IMSL. If other routines are desired, they can be set using the OPTIONS commands RECVER and RNVER. The current autoexec.b34 file resets these to IMSL_1 and DRNNOA. For example: RECVER(GGUBS) => IMSL routine GGUBS, the default. RECVER(RAND) => the old IBM RAND routine. RECVER(RAN1) => Numerical Recipies RAN1 (see page 196). RECVER(RAN2) => Numerical Recipies RAN2 (see page 197). RECVER(RAN3) => Numerical Recipies RAN3 (see page 199). RECVER(FORT90) => Fortran 90 Random number generator RECVER(IMSL_1) => IMSL Version 10 16807 Generator RECVER(IMSL_2) => IMSL Version 10 16807 Generator Shuffled RECVER(IMSL_3) => IMSL Version 10 397204094 Generator RECVER(IMSL_4) => IMSL Version 10 397204094 Generator Shuffled RECVER(IMSL_5) => IMSL Version 10 960706376 Generator RECVER(IMSL_6) => IMSL Version 10 960706376 Generator Shuffled RECVER(IMSL_7) => IMSL Version 10 Recursion option RNVER(GGNML) => IMSL GGNML routine, the default. RNVER(GRAND) => GRAND routine. RNVER(GASDEV) => Numerical Recipies GASDEV (page 203). RNVER(GASDEV2) => Numerical Recipies GASDEV2 (page 203). RNVER(GASDEV3) => Numerical Recipies GASDEV3 (page 203). RNVER(GASDEV4) => FORT90 random number generator and GASDEV. RNVER(DRNNOA) => IMSL-10 Acceptance/rejection generator RNVER(DRNNOR) => IMSL-10 Inverse CDF Generator The GASDEV2 and GASDEV3 routines are modifications of GASDEV to call RAN2 and RAN3 respectively. The RAN1 is probably the best of the three. RAN2 is fast but has the limit that it produces one of only 714025 possible values. RAN3 is a "portable" routine. In Monti Carlo work, the possibility of experimenting with different generators and different seeds is important. The OPTIONS command SETSEED allows different starting values to be used. For further discussion of some of the considerations in the selection of a random generator, see Numerical Recipes by Press, Flannery, Teukolsky, Vetterling Cambridge University Press 1989 chapter 7. The B34S MATRIX Command allows use of these generators. If RECVER and RNVER are set globally, they will also be set for the MATRIX facility. However local changes can be made to the generators used in MATRIX command programs. The default settings are GGUBS and GGNML which should be logically the same as the IMSL_1 and DRNNOR settings. The GGUBS and GGNML programs are available on all platforms. Extensive documentation for the random number generators is given in the IMSL documentation. The MATRIX command has been designed with serious numerical Monti Carlo calculations in mind. Note: There is a limit of 999 GEN statements that do not use LAG or LP functions. The max number of LP and LAG functions is 150. Linear processing with LP function. B34S allows user to calculate a complex linear process of the form XNEW = AR(XNEW) + MA(XOLD) where AR is an autoregressive weighting function starting with lag 1 and MA is a moving average weighting function starting with lag 0. The general form of the command is gen xnew = lp(nar,nma,xold) $ where NAR = number of AR terms (max 10), NMA = number of MA terms (max 10), XOLD= moving average variable Options and parameters for GEN LP function include GEOMETRIC - generate geometric MA weights PASCAL - to generate pascal weights LAMDA=r1 - sets lamda weight for geometric and pascal. Default = .5. Maximum of 3 digits allowed. IORDER=n1 - Order for pascal weights. Default = 2. JUMP=n2 - # of weights truncated off left hand side. OVCON=n3 - Overflow index for AR system. The output is test against 10**n3. If more than 10 overflows occure, the problem is stopped. OVCON defaults to 10. AR=(A1,A2) - Inputs AR weights if NAR ne 0. MA=(M1,M2) - Inputs MA weights if NMA ne 0 and GEOMETRIC and PASCAL not set. VALUES(val1,val2) - inputs initial values if NAR ne 0. Example 1. Use 3 order PASCAL weights for 7 terms with lamda = .4. gen xnew=lp(0,7, xold) pascal iorder=3 lamda=.4 $ Example 2. Use 4 AR and 2 MA weights. gen xnew=lp(4,2,xold) ar(.4,.3,.2,.1) ma(.5,.5) ovcon=100 values=(100.0,98.0,97.0,100.0) $ Usage note: The GEOMETRIC and PASCAL options assign MA=0.0 unless the JUMP sentence is used. Example 3. Uses LP command to generate Box-Jenkins Models NORM = a random series NORMD1 = an AR model having form (1.0 + .5B)x(t) = e(t) AR3 = an AR model having form (1.0 -.9B +.4B**2 +.3B**3)x(t)=e(t) MA3 = an MA model having form x(t)=(1.0-.7B+.8B**2+.4B**3)e(t) ARMA = an ARMA model having form (1.0-.5B)x(t)=(1.0-.7B+.8B**2+.4B**3)e(t) /$$ tests bjest b34sexec data noob=3000 nohead heading('random and first diff') maxlag=3$ build norm normd1 ar3 ma3 arma$ gen norm=rn()$ gen normd1=norm-lag1(norm)$ gen ar3=lp(3,1,norm) ar(.9 -.4 -.3) ma(1.) values(0.0 0.0 0.0)$ gen ma3 =lp(0,4,norm) ma(1. -.7 .8 .4) values(0.0)$ gen arma=lp(1,3,norm) ar(.5) ma(1. -.7 .8) values(0.0)$ b34srun$ b34sexec bjiden$ var norm normd1 ar3 ma3 arma$ seriesn var=norm name='white noise'$ seriesn var=normd1 name='white noise differenced'$ seriesn var=ar3 name='ar(3) '$ seriesn var=ma3 name='ma(3) '$ seriesn var=arma name='arma(1,2) '$ rauto norm normd1 ar3 ma3 arma$ b34srun$ b34sexec bjest$ model normd1$ modeln p=1 avepa=.1$ forecast nf=10 nt=2995$ b34srun$ b34sexec bjest$ model ar3 $ modeln p=(1,2,3) avepa=.1$ forecast nf=10 nt=2995$ b34srun$ b34sexec bjest$ model ma3 $ modeln q=(1,2,3) avepa=.1$ forecast nf=10 nt=2995$ b34srun$ b34sexec bjest$ model arma $ modeln p=1 avepa=.1 q=(1,2)$ forecast nf=10 nt=2995$ b34srun$ 13.6 Examples of DATA Paragraph The simplest way to input 5 observations on 3 series is: b34sexec data$ input x y z$ datacards$ 1 11 111 2 22 222 3 33 333 4 44 444 5 55 555 b34sreturn$ b34seend$ The above example could be written as b34sexec data$ input x y z$ datacards$ 1 11 111 2 22 222 3 33 333 4 44 444 5 55 555 b34sreturn$ b34seend$ since if NOOB is NOT set the default reading option is FILEF=@@. If NOOB is set explicitly, FILEF=FREE by default. This will read faster. b34sexec data noob=5$ input x y z$ datacards$ 1 11 111 2 22 222 3 33 333 4 44 444 5 55 555 b34sreturn$ b34seend$ If NOOB is set and the data is all on one row or broken up, then FILEF=@@ must be set. b34sexec data noob=5 filef=@@$ input x y z$ datacards$ 1 11 111 2 22 222 3 33 333 4 44 444 5 55 555 b34sreturn$ b34seend$ It is to be noted that FILEF=@@ is the most flexible reading format but it also is the slowest. FILEF=FREE is faster, but is slower than FILEF=FIXED. FILEF=DP is the fastest. Column loading of data (filef=cfixed) is illustrated next b34sexec data filef=cfixed noob=3; input x(1,1,3) y(1,4,4); datacards; 1234 4321 9998 b34sreturn; b34srun; b34sexec list; b34srun; b34sexec data filef=cfixed; input x(1,1,3) y(1,4,4); datacards; 1234 4321 7778 b34sreturn; b34srun; b34sexec list; b34srun; b34sexec data filef=cfixed; input x(1,1,3) y(1,4,6); datacards; 1234 4321 999 . b34sreturn; b34srun; b34sexec list; b34srun; The next example shows data building. b34sexec data noob=5$ input x1 y1 z $ build sumy1z $ gen sumy1z = y1+z$ datacards$ 11 22 33 33 44 33 333 3.0 666 22 33 11 11.2 33.4 22.2 b34sreturn$ b34seend$ b34sexec regression$ model x1 = sumy1z$ b34srun$ /$ /$ we reload the data and build some more data. /$ note that prior step ends with b34srun$, not b34seend$. /$ optional labels have been supplied. /$ b34sexec data set$ build x1ty1 x1py1 xx$ label x1ty1 = 'x1 * y1 '$ label x1py1 = 'x1 + y1 '$ label xx = '(x1**3)/(sin(x1)+(2.0*cos(x1/y1)))'$ gen x1ty1 = x1*y1$ gen x1py1 = x1+y1$ gen xx = (x1**3)/(sin(x1)+(2.0*cos(x1/y1)))$ b34seend$ /$ /$ we list the data /$ b34sexec list$ b34srun$ Data loading from a file on PC Assume X1,...,X6 are on a file MYDATA.DAT. The below listed statements will load the series and perform a regression after building data. Since there is no FILEF= parameter, FREE is assumed. If more than one observation is placed on one card, use FILEF=@@. b34sexec options open('mydata.dat') unit=10 disp=old$ b34seend$ b34sexec data unit(10)$ input x1 x2 x3 x4 x4 x6$ build x1sq x1tx2$ gen x1sq =x1*x1$ gen x1tx2 =x1*x2$ b34seend$ b34sexec regression$ model x6=x1 x2 x3 x4 x5 x1sq $ b34seend$ The same job can be coded with the file statement as: b34sexec data file('mydata.dat')$ input x1 x2 x3 x4 x4 x6$ build x1sq x1tx2$ gen x1sq =x1*x1$ gen x1tx2 =x1*x2$ b34seend$ b34sexec regression$ model x6=x1 x2 x3 x4 x5 x1sq $ b34seend$ 13.7 Reading DMF Files The B34S DATA pragraph can be used to load data from a B34S DMF file. Section 40.0 discusses how to create and maintain B34S DMF files. Since B34S has a limit of 98 variables and one constant, the DMF data library is provided as a means by which to store large numbers of data series and selectively read series into B34S. The current maximum number of series in a DMF file is 9999 although this can change in future releases. There can be multiple members in a DMF library. Members are selected using the DMFMEMBER( ) parameters on the DATA sentence. If DMFMEMBER( ) is not specified, the first member is read. Assume that there are 880 series in B34S DMF file MYDATA.DMF. It is desired to load series X, Y, Z from the first member. The following commands will load the series: b34sexec options open('c:\mysd\mydata.dmf') unit(60)$ b34seend$ b34sexec data filef=dmf unit(60)$ input x y z$ b34seend$ If MYDATA.DMF was a formatted DMF file, the correct commands would be: b34sexec options open('c:\mysd\mydata.dmf') unit(60)$ b34seend$ b34sexec data filef=fdmf unit(60)$ input x y z$ b34seend$ Note: As of version 7.12g, b34s has been modified to detect whether the file is formatted or unformatted. Hence FILEF=DMF or FILEF=FDMF can be used. If member CRIME was to be loaded the above two examples would be: b34sexec options open('c:\mysd\mydata.dmf') unit(60)$ b34seend$ b34sexec data filef=dmf unit(60) dmfmember(crime)$ input x y z$ b34seend$ If MYDATA.DMF was a formatted DMF file, the correct commands would be: b34sexec options open('c:\mysd\mydata.dmf') unit(60)$ b34seend$ b34sexec data filef=fdmf unit(60) dmfmember(crime)$ input x y z$ b34seend$ In place of the open statements, the following can be used: b34sexec data filef=fdmf file('c:\mysd\mydata.dmf') dmfmember(crime)$ input x y z$ b34seend$ If MYDATA.DMF contained 80 series and the user wanted to load all 80 series without using an INPUT statement, the correct setup would be just to omit the IMPUT statement. If the DMF file contained more than 98 series and the INPUT statement was omitted, only the first 98 series would be read. The IBEGIN and IEND options on the DATA sentence control whether all observations from the DMF file are read. Assume that the MYDATA.DMF file contains 2000 observations but the user wants to load only from observation 23 to observation 1023. The correct commands would be b34sexec options open('c:\mysd\mydata.dmf') unit(60)$ b34seend$ b34sexec data filef=dmf unit(60) dmfmember(crime) ibegin=23 iend=1023$ input x y z$ b34seend$ assuming the member name was CRIME and only X Y Z was desired. 13.8 Data Bank Options Note: The NBER data bank option is rarely used these days. It might be removed from future B34S versions. The DMF facility has replaced the capability in the BANK command. BANK sentence options. READ - Sets up to read from NBER Data Bank. This is the default. MERGE - Sets up to merge data from one bank with data from another bank. LISTB - Will list data bank on unit IBNKU and documentation on unit IDOCU. NODOC - Will suppress documentation listing. This option is not recommended. The default is to give documentation. BANK sentence parameters. IBNKU=n1 Sets bank unit for data. Default = 19. IDOCU=n2 Sets documentation bank unit. Default = 20. IMDU =n3 Sets merge bank unit. Default = 24. IDOCM=n4 Sets merge documentation unit. Default = 20 IVARC=n5 Sets number of variables read from main bank for a bank read. If omitted, it defaults to NVAR. If the MERGE option is specified. Need to set IVARC LT NVAR. (NVAR-IVARC) series will be read from the merge bank. INDXB=n6 Sets first bank index number to read/write if BI parameter not used. It is recommended that BI parameter be used. ISKP =n7 Sets number of observations to skip before reading data in bank. The NBER data banks at UIC start in 1960. IY1 =n8 Sets last two digits of initial year. Default = 1. IP1=n9 Sets frequency per year. Default = 1. IB1=n10 Sets initial period of data. Default = 1. IMBCRC=n11 Sets number of records in merge bank. Defaults if IMKU = 19, 24 or 25. IMDCRC=n12 Sets number of records in merge documentation. Will default if IMDU = 20, 26, 23, 27. IPERD=n13 Sets number of observations per period of merge data. IMSKP=n14 Sets number of observations to skip for merge data. If both data banks start in same period, IMSKP = IPERD * ISKP. BI=(n1,n2) Sets index numbers of data bank. MI=(m1,m2) Sets index numbers of merge data bank. The below listed example lists the monthly, quarterly and yearly banks. It illustrates all JCL needed at UIC. /*jobparm t=(,30),l=5,r=2048 // exec b34s //go.ft24f001 dd dsn=bec4346.#nber.mo.data,disp=shr,label=(,,,in) //go.ft20f001 dd dsn=bec4346.#nber.mo.docm,disp=shr,label=(,,,in) //sysin dd * b34sexec data$ * list monthly bank $ bank listb ibnku = 24 idocu = 20$ b34seend$ // exec b34s //go.ft19f001 dd dsn=bec4346.#nber.qt.data,disp=shr,label=(,,,in) //go.ft23f001 dd dsn=bec4346.#nber.qt.docm,disp=shr,label=(,,,in) //sysin dd * b34sexec data$ * list qt bank $ bank listb ibnku = 19 idocu = 23$ b34seend$ // exec b34s //go.ft25f001 dd dsn=bec4346.#nber.yr.data,disp=shr,label=(,,,in) //go.ft27f001 ff dsn=bec4346.#nber.yr.docm,disp=shr,label=(,,,in) //sysin dd * b34sexec data$ * list yr bank $ bank listb ibnku = 25 idocu = 27$ b34seend $ The following example reads series 10 20 30 from one quarterly bank and 10 11 16 from another monthly bank. The user knows the names of these series and on the reread renames them X1 X2 X3 XX1 XX2 XX3. b34sexec data noob=100 noconstant nvar=6 ivarc=3 $ bank merge iperd=3 ibmku=19 idocu=23 iskp=8 bi=(10 20 30) mi=(10,11,16) imku=24 imdu=20 imskp=24 $ b34seend$ b34sexec data unit=8 filef=dp$ input x1 x2 x3 xx1 xx2 xx3 $ b34seend $