13.0 DATA Command Revised April 2003
Overview of capability.
The DATA command loads data in B34S using observation by
observation reading for up to 98 series.
If data is saved variable by variable, use the READVBYV
command, which is documented in section 43.
The SCAIO command, documented in section 53, reads and
writes SCA MAD files while the SCAINPUT command, documented
in section 31, reads and writes SCA FSAVE files or reads
and writes RATS(r) POR files.
If more than 98 series are needed, another way to load data
into B34S is to read the DATA into the MATRIX command
and generate a B34S DATA step for a subset of series. A still
further option is to save series in a dmf file and read an
extract of the series in the dmf file from the b34sexec data
command.
Note that the MATRIX command can read and write Speakeasy
portable checkpoints and Matlab save files.
Since the SAS(r) DATA step has substantial capability to
build and manage data files, B34S can be run under SAS(r)
rather than stand alone using the DATA command. The SAS to
B34S interface works on the Windows, Linux, RS/6000 and
Sun versions of B34S using supplied SAS Macros.
B34S DATA steps can be generated using SAS on platforms where
B34S does not run and ported to another platform which runs B34S.
The B34S DATA step can be used to load a subset of data from a
B34S DMF (Data Management Facility) file. For further detail see
section 13.7.
The B34S DATA command reads all data in observation by observation
format which is the preferred way. If the data is only available
in variable by variable format, the READVBYV command should be
used to load the data. Data loaded by this command can be modified
by a subsequent B34S DATA command using the SET statement.
Form of DATA command:
B34SEXEC DATA options parameters $
INPUT Var1 Var2 $
BUILD Vark Varj $
CHARACTER Vark Varj $
FORMAT=(' ') $
LABEL Var1=' '
var2=' ' $
RENAME Oldname=newname $
BANK options parameters $
GEN XNEW = FUNCTION( ) $
COMMENT=('This is a comment')$
DATACARDS $
(note: data cards are here)
Note: that data cards are placed here.
PGMCARDS $
B34SRETURN $
B34SEEND $
Overview of sentences in the DATA command.
INPUT - Specify variables to be loaded.
BUILD - Specify names of variable to be built.
CHARACTER - Specify variable loaded is Character*8.
FORMAT - Specify Fortran format if filef=fixed is
in effect. NOOB should be set.
LABEL - Optionally supply a 1-40 character label.
RENAME - Allows a variable to be renamed. Used if
b34sexec data set;
is being used to read a current dataset and
further process the data.
BANK - Access Wharton-Styule Bank. Not used much.
GEN - Build a variable.
COMMENT - Supply a comment to identify data step.
DATACARDS - Header for reading data.
PGMCARDS - Header for reading data, data will echo in log.
B34SRETURN - Footer for data.
B34SEEND - End of B34S Dtat loading ste. B34SRUN$ can
also be used.
Note: If not reading data from a DMF file either the INPUT
sentence or the BANK sentence are required unless the
READCROSS option has been specified. If the BANK sentence is
specified, the GEN sentence cannot be used in the same DATA
paragraph since the variable names are not known to the system at
parse time. The BANK command allows data loading from a specific
data bank format supported by Wharton Econometrics. This command
is not used by most users and may be removed and or modified
at a later date.
If data is not read, only generated, the INPUT sentence is not
needed BUT the BUILD sentence is required.
DATA command examples for common applications.
The simplest way to load two series x and y into b34s is:
b34sexec data$
input x y$
datacards$
11 22
33 44
55 66
b34sreturn$
b34seend$
If a variable z is desired such that z=2*x*y, then use
b34sexec data$
input x y$
build z$
gen z=2*x*y$
datacards$
11 22
33 44
55 66
b34sreturn$
b34seend$
DATA sentence options.
SET - Will load the current data set and allow adding new
generated variables. For usage see example 7 under
OPTIONS or the simple example where z is built from
variables x and y in a previously built dataset.
b34sexec data$
input x y$
datacards$
11 22
33 44
55 66
b34sreturn$
b34seend$
b34sexec data set$
build z$
gen z=x*y$
b34srun$
CORR - Output correlation matrix of variables. If a variable
has no variance, the correlation between this variable
and all other variables is assumed to be 0.0. A variable
is assumed to be correlated 1.0 with itselt even in the
case when the variable has no variance. These two
conventions are in contrast to the approach that sets the
correlation to missing() if any of the variables in the
calculation have zero variance.
Note: The CORR command is intended to be used for a quick,
fast and compact look at the correlation matrix. Due
to these constraints for extream cases there can be
accuracy losses. For high accuracy calculations the
data should be loaded into the MATRIX command and the
CCF command used.
COV - Output variance covariance matrix.
NOTIME - Suppress timing data.
REWIND - Will rewind unit specified with UNIT prior to reading data.
If input file has been set with FILE(' ') unit 10 will be
used.
TIME - Give timing information. (This is the default).
NOHEAD - Suppress first two pages of B34S output to save paper.
This is the default.
HEAD - Gives first two pages of output. This is needed if LIST=key
option is used.
NOCONSTANT - Suppresses automatic constant creation.
KEEPMISS - Keeps missing data in the sample unless explicitly
removed. For further information see section 1.22.
This option is the default unless explicitly
changed by DROPMISS. This switch is usually set
on the OPTIONS sentence.
DROPMISS - Drops all observations read containing missing
data. For further detail see section 1.19. This
switch is usually set on the OPTIONS sentence.
READMISS - Sets the default input format on the DATA
paragraph to FILEF=@@. See section 1.19 and 13.
This switch is usually set on the OPTIONS sentence.
DNREADMISS - Sets the default input format on the DATA paragraph
to FILEF=FREE. See section 1.19 and 13.
This switch is usually set on the OPTIONS sentence.
WRITECROSS - Writes cross products and variable names on unit 35
to be read by subsequent READCROSS option. If WRITECROSS
has been specified, WRITETRANS cannot be used. Unit
35 must have been allocated as formatted.
WRITETRANS - Write data on unit 37 in format (I6, I2, 6E12.5) where
I6 gives the observation number and I2 gives the card #.
This option is rarely used.
READCROSS - Reads cross products and variable names from a previous
run off file on unit UNIT. If this option is set, user
needs to input NOOB, NVAR and UNIT. No further
options or parameters need be set. If this options is
used ONLY the REGRESSION command can be used. Other
commands will give unpredictable results since no data
has been saved.
Note: The WRITECROSS and READCROSS options only allow running very
simple regressions using REGRESSION command. The REGRESSION
command will not support the RESIDUALA and the RESIDUALP
options since no data is available. All other B34S commands will
not work since they require the raw data. The cross product
options are only useful if very very large data sets are to be
analysed at very lost cost using OLS. The READCROSS option
checks the NOOB and NVAR values supplied against those saved
when WRITECROSS was given. It is imperative that any DATA
command using the READCROSS option terminate with the
B34SRUN$ sentence in place of the B34SEEND$ sentence to
initialize variable names for subsequent REGRESSION
paragraphs.
The cross product file has been made portable across machines.
The header card gives the time and date when this file was made.
The below two jobs show use of this facility. In recent years
this facility has not been used much. Its value might be in a
case where one had 10,000,000 or so observations and wanted to
experiment with different models and only build the moment
matrix once!
Example 1: Making a cross product file:
/$ Tests writing of cross products
b34sexec options open('crossp.dat') unit(35) form=formatted
disp=unknown$ b34seend$
b34sexec options clean(35)$ b34seend$
b34sexec options include('c:\b34slm\gas.b34')$ b34srun$
b34sexec data set maxlag=3 writecross heading=('gas data') $
build l1gasin l2gasin l3gasin l1gasout l2gasout l3gasout$
gen l1gasin=lag1(gasin)$
gen l2gasin=lag2(gasin)$
gen l3gasin=lag3(gasin)$
gen l1gasout=lag1(gasout)$
gen l2gasout=lag2(gasout)$
gen l3gasout=lag3(gasout)$
b34seend$
b34sexec regression$
model gasout=gasin l1gasin l2gasin l3gasin
l1gasout l2gasout l3gasout$
b34srun$
Example 2: Reading Cross Product file
/$ Tests reading of cross products
b34sexec options open('crossp.dat') unit(35) form=formatted
disp=unknown$
b34seend$
b34sexec data readcross head noob=293 nvar=10 unit=35$
b34srun$
b34sexec regression$
model gasout=gasin l1gasin l2gasin l3gasin
l1gasout l2gasout l3gasout$
b34srun$
b34sexec regression$
model gasout=gasin $ b34srun$
DATA sentence parameters.
LIST = key If key = FIRST5 will list first 5 observations. This is
the default.
If key = RAW will list original data.
If key = TRANS will list transformed data.
If key = BOTH both RAW and TRANS options are in effect.
Note: This option requires HEAD be set. It is useful
if there is data readin problem since series are
listed as they are read.
UNIT = n1 Sets input unit. If specified must be 10 or some value
above 30 unless reading previous current data on
unit 8 in DP unformatted. If the UNIT option is not
specified, data must be input using DATACARDS$ or
PGMCARDS$ sentences. See below examples and input
discussion. In the place of UNIT, the keyword
FILE(' ') can be used. If FILE is used,
UNIT will default to 10 unless set to another
unit. The file will stay open until closed
with the statement
b34sexec options close(10)$ b34srun$
FILE(' ') Sets B34S data file input. In place of FILE(' ')
the user can open the file with open statement.
The following two jobs are the same: The
string ' ' can be up to 72 in length.
b34sexec options open('mydata') unit=10 disp=old$
b34seend$
b34sexec data unit(10)$
input x y z$
b34seend$
b34sexec data file('mydata')$
input x y z$
b34seend$
Note: If the FILE paramater is used in a multistep dataset where the
file to be read is built in a prior step IT IS IMPERATIVE that
B34SRUN be used in place of B34SEEND in the prior step or
ALLRUN be placed in the autoexec.b34 file on the PC. If this
is not done, the DATA step will not find the file.
NOOB = n2 Sets number of observations. If omitted, defaults to
the number of observations in file. If READCROSS
option is set, NOOB must be set. If DATACARDS$
or PARMCARDS$ is set and NOOB is not set, B34S
defaults to using FILEF=@@ and checking each
card for B34SRETURN$. This slows reading of the
data but in recent years is the most widely
used reading option since the missing codes NA,
na, NaN and the SAS code . are seen as missing data.
If NOOB is set FILEF=FREE is used which is
substantially faster since it uses Fortran
free format reading. Hence to improve performance,
it is recommended that NOOB be set when =@@
reading is not needed.
In a situation where NOOB is set and a CHARACTER
sentence is present, B34S will automatically set
FILEF=@@ if otherwise FILEF=FREE would have been used.
The fastest reading option is FILEF=FIXED which
was used for many years. This allows "jumping" over
variables not needed or in the wrong format. NOOB
should be set.
Warning: If NOOB is set and filef is not set to @@
then it is not possible to process missing
data codes or have more than one observation on
the line.
FILEF = key If key = FREE, data will be read using IBM free
format routines. NOOB not needed to be set
If key = DP, data will be read using unformatted
double precision.
If key = FIXED, the FORMAT sentence must be used
to specify the format.
If key = CFIXED the INPUT sentence must be modified
to show the location of the data.
Example:
input x(1,2,3) y(1 4,4);
If key = @@ more than one observation can be on the
line. The LRECL of the file must be LE 512.
This input convention replaces the SAS
missing variable code of . with blanks on
either side with the B34S missing value.
The missing value codes NA, na and Nan are
also supported. If this is used, the
missing value may have to be recoded by a
statement such as
GEN IF(MYDATA .EQ. MISSING())MYDATA=-1.0 $
so that overflows do not occur.
If it is desired to drop these observations
then the statement
GEN X=DIFMISSING(MYDATA)$
should be used. For further information on
missing data, see section 1.22. The key word
DIFMISSING stands for delete if missing
Note: FREE is the default for FILEF if NOOB NE 0.
@@ is the default if NOOB has not been set.
FILEF=@@ can read data in form 1.2-3.2 as
1.2 and -3.2.
If FILEF=FREE spaces or commas must separate
the numbers since Fortran rules are in effect.
If key = DMF then data will be read from the B34S
DMF (Data management file) allocated to
UNIT = n using the unformatted convention.
If key = FDMF then data will be read from the B34S
formatted DMF (Data management file)
allocated to UNIT = n using the formatted
convention.
Note: B34S version 7.12g and beyond recognize
the appropriate file format so FILEF=FDMF
does not have to be explicitly set for
formatted files. In cases where the operating
system might not be able to detect the
format of a file, DMF and FDMF can be used.
Note: FILEF=DMF or FILEF=FDMF requires that the
INPUT sentence is used if there are more
series in the DMF file than 98 and the
user wants to read series other than the
first 98. More detail on DMF files is
contained in section 40.0. For examples
see section 13.7.
DMFMEMBER(k) Sets DMF member name. If this parameter is omitted
the first member is read.
IBEGIN=n Sets first observation to read from DMF file. If
this parameter is omitted first observation is read.
IEND=nn Sets last observation to read from DMF file. If this
parameter is omitted, last observation is read.
PROBNUM = n3 Sets problem number. If omitted, numbers problems
sequentially.
HEADING =(' ') Sets optional heading. Max of 32 characters.
DEBUG = n4 Set debug value in range 0-9. This option is only
useful for software developers.
WEIGHT = key Sets weight variable. Variable specified by key must
have been mentioned in INPUT or BUILD sentence. This
option does not change data stored on unit 8.
Assume Z = weight variable, Y = new variable and X =
old variable. Transformation used is:
Y = X * (Z ** .5) * ( (1 / mean Z) ** .5 )
MAXLAG = n5 Sets max lag so that the first n5 observations will
not be used in subsequent analysis. MAXLAG must be
specified if LAG or LP operators are used. It is
imperative that MAXLAG be set correctly. The range
for MAXLAG is 0-99.
NVAR=n6 Sets number of variables. Needed if READCROSS
option is set or if BANK sentence is used. In other
situations NVAR is indirectly picked up from INPUT
and BUILD sentences.
Date setting options. For further detail, see section 1.19 and the
OPTIONS paragraph.
SETFREQ(R) - Sets base frequency. 1. = annual data. .1 =
data once per decade. R can be set as real OR
integer.
SETYEAR(NN) - Sets base year for annual data. Frequency assumed
=1.
SETMY(M1,Y1) - Sets base year for monthly data. Frequency assumed
=12.
SETQY(Q1,Y1) - Sets base year for quarterly data. Frequency
assumed = 4.
SETDMY(D1,M1,Y1) Sets base year for daily data. Frequency assumed
=365.
IDVAR = xx Sets the character variable xx as an id variable
for the observation.
IDDATE= xx Sets the variable xx as a julian date variable to
identify each observation.
IDDATETIME=xx Sets the variable xx as a julian date variable and
indicates that time info is saved.
RMISSING( ) Sets data value that will be automatically coded
into B34S missing value code. For other missing
value options, see section 1.22.
INPUT sentence.
INPUT X1 X2 X3 $
A Maximum of 99 series can be inputted. B34S variable names are 1-8
characters and follow SAS variable naming conventions. If the BUILD
sentence is used, the total number of series in B34S must be LE 69. A
CONSTANT variable is automatically added to the B34S data set unless the
NOCONSTANT option was specified. B34S has a limited ability to process
CHARACTER data in release 6.23 and above. If any of the variables listed
on the INPUT sentence are CHARACTER type, they must be listed in the
CHARACTER sentence which is discussed next. For further detail on
CHARACTER variables, see the details discussed in section 13.1
"Processing Character Data in B34S."
If FILEF=CFIXED then the INPUT sentence must be modified
INPUT X1(icard,icolstart,icolend) X2( ) $
For example is X1 is on card # 1 in col 5-9 , X2 is in card # 1
in col 23-30 and X3 is on card # 2 in col 6-12 the INPUT sentebnce would
be:
INPUT X1(1,5,9) X2(1,23,30),X3(2,6,12)$
CHARACTER sentence.
CHARACTER Xn1 Xn3$
Variables containing character data must be listed on a CHARACTER
sentence. For further detail on processing of character data, see
section 13.8. The CHARACTER sentence need not pass any variable names.
BUILD sentence.
BUILD Xk1 Xk2 $
The BUILD sentence allocates additional B34S variable names which
are built with GEN sentences. See below for form of GEN sentences.
FORMAT sentence.
FORMAT=('(4E16.8, 2X,18F2.0)') $
The FORMAT sentence is required if FILEF = FIXED. A maximum of 480
characters is allowed. If the FORMAT sentence extends over more than one
card, be sure to stop in col 72 on each card.
LABEL sentence.
LABEL X1='MORE INFO ON VARIABLE X1'
X2='MORE INFO ON VARIABLE X2' $
The LABEL sentence allows the user to provide more information on
a variable that is allowed by the 8 character name. Up to 40 columns of
text can be provided inside the ' '. The LABEL sentence is optional.
RENAME sentence.
RENAME oldname = newname $
The RENAME sentence is used to replace a name in a dateset with
a new name. It is usually used in situations where the SET statement
is used. Assume X1 is in the original dataset. The statements
b34sexec data set $ build y z$
gen y=x2*x2$
gen z=x2**2$
rename x1=newx1$
gen newx1=y/z$
allow reusing the location X1. A rename without more GEN statements
leaves the old values in place. RENAME statements are executed after
INPUT & BUILD statements. Once a RENAME statement has been found, the
old name is not available.
COMMENT sentence.
COMMENT=('Any text here') $
Any number of comment cards can be used. These will be printed by
the B34S DATA step. A max of 78 characters is allowed.
DATACARDS, PGMCARDS and B34SRETURN sentences.
If the UNIT parameter is not specified to point B34S to a data
unit for input, the DATACARDS$ or PGMCARDS sentences are used to input
data. The former sentence will not list datacards on log, while the
latter will. The below listed example shows loading of three series and
building another series which is the sum of the second and third series.
b34sexec data noob=5$
input x y z$
build ypz$
gen ypz = y + z$
datacards$
11 22 33
44 88 99
35 11 19
23 32 11
14 24 36
b34sreturn$
b34seend$
b34sexec rr$
model ypz = x$
b34seend$
Finally a regression is run. Note that free format is used. Since NOOB
was specified, FILEF=FREE was implicitly used. The command PGMCARDS$
could have been substituted for DATACARDS$ command. The DATACARDS and
PGMCARDS sentences read col 1-80 by default. If line numbers are present
and a free format read is used which reads more variables than fit on
one line, problems will occure. The CARD72 option on the DATACARDS and
PGMCARDS sentences in the DATA paragraph will remove line numbers. The
CARD72 command has not been used below because it is not needed.
13.1 Processing Character Data
Since version 6.23 B34S has had a limited ability to process character
data in the B34S DATA command. The B34S MATRIX command has
substantially more capability in this area since more data types are
supported and a number of facilities are available.
B34S character variables are limited to a maximum of 8 characters.
Character data must be read with FILEF=FIXED or FILEF=@@.
Character data can only be listed with the B34S LIST command. If a
character string is greater than eight characters and is read with
FILEF=@@, only the first 8 characters will be used. All character
variables must be listed on the CHARACTER sentence. If character data
is passed to a procedure other than LIST or to a GEN statement in the
DATA step, unpredictable results can occur unless the GEN function can
process character data. Additional character capability may be added in
future B34S versions. B34S CHARACTER data cannot be saved in
FSV files, nor can character data be explicitly saved in files with the
SCAINPUT command. The reason for this restrictions is that SCA and
RATS do not support these data types. Data sets containing character
data can be modified using the SET option of the DATA command. Some of
these restrictions may change in future releases. A few examples of
character data are given below.
b34sexec data noob=3 filef=@@$
input x y$
character x$
datacards$
aa 111
bb 222
cc 333
b34sreturn$
b34seend$
b34sexec list$
b34seend$
b34sexec data noob=3 filef=fixed$
input x y$
character x$
format('(a2, f4.0)')$
datacards$
aa 111
bb 222
cc 333
b34sreturn$
b34seend$
b34sexec list$
b34seend$
b34sexec data$
input d m y$
character d$
datacards$
12a 12 86
12b.b 12 1986
c12 12 1902
_12c 12 1886
12.g 12 1786
1aa 1 2002
b34sreturn$
b34seend$
b34sexec list$ b34seend$
GEN statements for character data are limited to copy and logical
operators. If character data is placed in other GEN statements,
unpredictable results may occur. The example given below illustrates
what is possible.
b34sexec data $ input x y $
build xx test test1 test2 yy $
character x y yy xx$
gen yy='a'$
gen if(x.eq.yy) test=100$
gen if(x.eq.'a')test1=100$
gen if('a'.eq.x)test2=100$
gen xx='abcd'$
datacards$
a b
c c
e f
b34sreturn$
b34seend$
b34sexec list$ b34seend$
In all logical operations involving .eq. , the result is NOT case
sensitive. Dates can be converted to character representation using
the GEN functions CHARDATE and CHARDATEMY.
Extensive character processing can be doien using the MATRIX command.
The below listed job illustrates passing data to this command:
b34sexec data noob=3 filef=@@$
input x y$
character x$
datacards$
aa 111
bb 222
cc 333
b34sreturn$
b34seend$
b34sexec list$
b34seend$
b34sexec matrix;
call loaddata;
call names(all);
call tabulate(x,y);
b34srun;
Edited Output produced:
B34S Matrix Command. Version December 2002.
Date of Run d/m/y 19/ 4/03. Time of Run h:m:s 12:16:38.
=> CALL LOADDATA$
=> CALL NAMES$
# Name Type Klass Row-Col Label
1 X Char*8 D1array 3 by 1
2 Y Real*8 D1array 3 by 1
3 CONSTANT Real*8 D1array 3 by 1
Space available 7869961 , used 122 , peak used 122
# Temp varibles 1 , peak # used 4
=> CALL TABULATE(X,Y)$
Obs X Y
1 aa 111.0
2 bb 222.0
3 cc 333.0
B34S Matrix Command Ending. Last Command reached.
Space available in allocator 7869961, peak space used 122
Number variables used 4, peak number used 4
Number temp variables used 1, # user temp clean 0
13.2 Data Building Options - GEN sentence
GEN sentence.
GEN XNEW = FUNCTION(arg1,arg2, .... ) $
GEN XNEW = analytic statement here $
The GEN sentence allows building of variables. If variable names
are used, they must have been mentioned on prior INPUT or BUILD
sentences. The same variable can be used over and over again as a
temporary variable if desired. The GEN sentence only works for data
loaded with INPUT or BUILD commands since the BANK command does not know
variable names.
GEN sentence FUNCTIONS. In all cases XNEW is the variable built, REALN
= any real number (a max of 8 digits, including
the decimal point, is allowed.) INUM = integer.
Some functions use character input CHARVAR.
It is important that argument sequences be
followed exactly. All variables are initialized
to 0.0 by the BUILD sentence.
Examples of GEN sentence fuction.
gen x=missing();
gen y=sqrt(z);
Examples of GEN sentence Analytic Statements
gen x=x*x$
gen y=(x+2.)/(j + kkk)$
gen vv = xx - jj + kk $
gen x =(x-lag(x))/lag(x) $
gen x =sin(y*q)/2.0 $
gen test5=log(exp(log(exp(5.0))))$
IF-THEN statements are allowed with the following operators.
.EQ. .NE. .LT. .LE. .GE. .GT. .AND. .OR.
Note that .GE. is recognized but . GE . is not.
gen if ( )_______$
must have _______ as THEN or a valid analytic statement without the
GEN.
Examples of IT-THEN statements.
gen if(x .eq. 2.0)y=x*x$
gen if(x.eq.2.0)y=x*x$
gen if(x.ne.y)then$
gen x=x**2$
gen q=q/x$
gen endif$
More complex statements can be used
gen if(x .ne. y .and. z .lt. v)then$
gen x=x**2$
gen q=q/x$
gen endif$
Valid analytic expressions can be used inside the IF-THEN
construction. For example
gen if(x**2 .ne. y .and. sin(z) .lt. v)then$
gen x=x**2$
gen q=q/x$
gen xx=log10(2.0*exp(x))/dsqrt(log(x))$
gen endif$
The statement
gen if(x .eq. 2.0)y=x*x$
is allowed but the statement
gen if(x .eq. 2.0)y=boxcox(x,2.0)$
is not allowed since BOXCOX is not supported as part of an analytic
statement. If the logic of the above statement is wanted the correct
form would be
gen if(x .eq. 2.0)then$
gen y=boxcox(x,2.0)$
gen endif$
B34S Functions allowed as part of analytic statements. More data
building capability is provided in the MATRIX Command.
See more detailed help below.
Function Example of use Task
ABS gen y=abs(x); y=|x|
ASIN gen y=asin(x); Arc sin
BETAPROB gen y=betaprob(x1,x2,x3); Beta probability
CDAY gen day=cday(); Character form of Day
CHARDATE gen cd1=chardate(julian); Returns dd\mm\yy
CHARDATEMY gen cd2=chardatemy(julian); Returns mm\yyyy
CHARTIME gen ctime=chartime(julian); Returns hh:mm:ss
CHISQPROB gen csprob=chisqprob(x1,x2); Chisq prob of x1 with DF x2
.5 le x2 le 2000
CHTOREAL gen nreal=chtoreal(xchar); Converts Character to real
CJULDAY gen juldate=cjulday(); Julian for that obs.
CMONTH gen month=cmonth(); Months for that Obs.
COS gen y=cos(x); Cosine of x in y.
CQT gen quarter=cqt(); Quarter for that obs.
CYEAR gen year=cyear(); Year for that obs.
DABS gen y=dabs(x); y=|x|.
DARCOS gen y=darcos(x); y= acos(x)
DASIN gen y=dasin(x); y=asin(x)
DATAN gen y=datan(x); y=tan(x);
DATAN2 gen y=datan2(x1,x2); y=datan2(x1,x2);
DCOS gen y=dcos(x); Cosine of x in y.
DCOSH gen y=dcosh(x); Hyperbolic cosine x in y.
DELOBS gen y=delobs(); Deletes observation.
DERF gen y=derf(x); See Fortran Manual.
DERFC gen y=derfc(x); See Fortran Manual.
DEXP gen y=dexp(x); y=e**x
DGAMMA gen y=damma(x); Integral from 0 to inf of
u**(x-1)*e**(-u)du
DIFMISSING gen y=difmissing(x); Deletes obs if x=missing
DINT gen y=dint(x); Integer part of x.
DLGAMA gen y=dlgama(x); Log gamma function.
DLOG gen y=dlog(x); Natural log x in y.
DLOG10 gen y=dlog10(x); Log base 10 x in y.
DMAX1 gen y=dmax1(x1,x2); y = max of x1 x2
DMIN1 gen y=dmin1(x1,x2); y = min of x1 x2
DMOD gen y=dmod(x1,x2); y = remainder of x1/x2.
DSIN gen y=dsin(x); y = sin of x.
DSINH gen y=dsinh(x); y = hyperblic sin of x.
DSQRT gen y=dsqrt(x); y = x**.5
DTAN gen y=dtan(x); y = tan of x
DTANH gen y=dtanh(x); y = hyperbolic tan of x
EXP gen y=exp(x); y = e**x
EXTRACT gen cy=extract(charvar,i,j)$ chxnew=charvar(i:j)
i, j must be in range 1-8
FDAYHMS gen y=fdayhms(h,m,s); Sets fraction of day given
hour, minute, second
FIND gen y=find(charvar,' '); finds location blank in
charvar
FPROB gen y=fprob(fval,df1,df2); Probability of F(df1,df2)
FYEAR gen y=fyear(juldate); Fraction of a year.
GETDAY gen y=getday(juldate); Day of year.
GETHOUR gen y=gethour(juldate); Hour of day.
GETMINUTE gen y=getminute(juldat); Minute of day.
GETMONTH gen y=getmonth(julday); Month of year.
GETQT gen y=getqt(julday); Quarter of year.
GETSECOND gen y=getsecond(julday); Second of day.
GETYEAR gen y=getyear(julday); Year.
IKOUNT gen y=ikount(); # of obs read.
INT gen y=int(x); y set to integer part of x.
INVBETA gen y=invbeta(x1,x2,x3); Inverse of Beta distribution.
x1 is probability
INVCHISQ gen y=invchisq(x1,x2); Inverse chi-squared.
0 le x1 le 1.0
.5 le x2 le 2,000,000
INVFDIS gen y=invfdis(x1,x2,x3); x1 = probability
x2 and x3 DF
INVTDIS gen y=invtdis(x1,x2); x1 = probability
x2 = DF
JULDAYDMY gen y=juldaydmy(day,m,year); gets julday
JULDAYQY gen y=juldayqy(qt,year); gets julday
JULDAYY gen y=juldayy(year); gets julday
KOUNT gen y=kount(); gets observation number
LAG gen y=lag(x); y(t)=x(t-1)
LAGn gen y=lagn(x); y(t)=x(t-n)
LOG gen y=log(x) y=natural_log(x)
LOG10 gen y=log10(x); y=log10(x)
MAKEINT gen y=makeint(x); y=integer part of x
MISSING gen y=missing(); y set to missing
MOVELEFT gen chnew=moveleft(chold,n); chold moved left n
MOVERIGHT gen chnew=moveright(chold,n); chold moved right n
NCCHISQ gen y=ncchisq(x1,x2,x3); Non central chi-square
x1 variable GE 0 for which to
calculate probability
x2 degress of freedom (GE .5)
x3 non centratity
.5 LE (x2+x3) LE 200000
NORMDEN gen y=normden(z); y= density of normal
distribution
NOT gen y=not(x); x=1.0 => y=0.0
x=0.0 => y=1.0
NOTFIND gen y=notfind(chold,' '); y = first nonblank
PLACE gen chnew=place(ch,i,j); chnew(I:I+J-1)=ch(1:J-I+1)
i, j must be in range 1-8
PROBIT gen y=probit(p); inverse normal dsitribution.
p = probability.
PROBNORM gen y=probnorm(z); normal distribution
probability
REALTOCH gen chav=realtoch(real); Convert real*8 to Ch*8
REC gen y=rec(); Variable from rectangular
distribution
RECCS gen y=reccs(); Generates random rectangular
number. Uses common seed.
This is only appropriate if
the option RECVER(RAND) is
in effect.
RN gen y=rn(); y = random normal deviate
SIN gen y=dsin(x); y = sin of x.
SQRT gen y=dsqrt(x); y = x**.5
TIMESPI gen y=timespi(x); y = x*pi
TPROB gen y=tprob(tval,df); y = probability of tval
given DF df.
Detailed discussion and examples of data building functions. Note
not all of these functions are allowed in an analytic statement.
Command Description Old B34S TG #
GEN XNEW = SQRT(XOLD) $ Square root 1
GEN XNEW = LOG(XOLD) $ Natural Log 2
GEN XNEW = BOXCOX(XOLD,REALN) $ XNEW=((XOLD**REALN)-1.)/REALN 2
GEN XNEW = LOG10(XOLD) $ Log to base 10 3
GEN XNEW = EXP(XOLD) $ XNEW = e ** XOLD 4
GEN XNEW = POWER(XOLD1,XOLD2) $ XNEW = XOLD1 ** XOLD2 5
GEN XNEW = POWER(REALN,XOLD) $ XNEW = REALN ** XOLD 6
GEN XNEW = POWER(XOLD,REALN) $ XNEW = XOLD ** REALN 10
GEN XNEW = INV(XOLD) $ XNEW = 1.0 / XOLD 7
GEN XNEW = ADD(XOLD,REALN) $ XNEW = XOLD + REALN 8
GEN XNEW = ADD(XOLD1,XOLD2) $ XNEW = XOLD1 + XOLD2 11
GEN XNEW = MULT(XOLD,REALN) $ XNEW = XOLD * REALN 9
GEN XNEW = MULT(XOLD1,XOLD2) $ XNEW = XOLD1 * XOLD2 13
GEN XNEW = SUB(XOLD1,XOLD2) $ XNEW = XOLD1 - XOLD2 12
GEN XNEW = DIV(XOLD1,XOLD2) $ XNEW = XOLD1 / XOLD2 14
GEN XNEW = GE(XOLD,REALN) $ XNEW = 1 if XOLD GE REALN 15
GEN XNEW = GE(XOLD1,XOLD2) $ XNEW = 1 if XOLD1 GE XOLD2 16
GEN XNEW = ASIN(XOLD) $ XNEW = ASIN(XOLD) 17
GEN DEL(XOLD) $ If XOLD = 0, observation dropped 18
GEN DEL(XOLD,REALN) $ If XOLD = REALN, obs dropped 19
GEN XNEW = SIN(XOLD) $ XNEW = SIN(XOLD) 20
GEN XNEW = COS(XOLD) $ XNEW = COS(XOLD) 21
GEN XNEW = LAG(XOLD) $ XNEW = lag of XOLD 22
GEN XNEW = LAG01(XOLD) $ XNEW = lag of XOLD 22
GEN XNEW = LAG1(XOLD) $ XNEW = lag of XOLD 22
GEN XNEW = LAG6(XOLD) $ XNEW = lag 6 OF XOLD 22
GEN GOTOTGN(XNEW,ITGN) $ Goes to TG NUMBER ITGN if XNEW=1.0 23
Number set as
GEN CONTINUE( ITGN) $
GEN CONTINUE( ) $ Label inside ( ) 24
GEN DOWN(XOLD,II) $ Jump down II TG spaces if XOLD=1.0 25
GEN UP(XOLD,II) $ Jump up II TG spaces if XOLD=1.0 26
GEN XNEW = GT(XOLD1,XOLD2) $ If XOLD1 GT XOLD2 then XNEW = 1.0 27
GEN XNEW = GT(XOLD1,REALN) $ If XOLD1 GT REALN then XNEW = 1.0 28
GEN XNEW = LT(XOLD1,XOLD2) $ If XOLD1 LT XOLD2 then XNEW = 1.0 29
GEN XNEW = LT(XOLD1,REALN) $ If XOLD1 LT REALN then XNEW = 1.0 30
GEN XNEW = LE(XOLD1,XOLD2) $ If XOLD1 LE XOLD2 then XNEW = 1.0 31
GEN XNEW = LE(XOLD1,REALN) $ If XOLD1 LE REALN then XNEW = 1.0 32
GEN XNEW = NE(XOLD1,XOLD2) $ If XOLD1 NE XOLD2 then XNEW = 1.0 33
GEN XNEW = NE(XOLD1,REALN) $ If XOLD1 NE REALN then XNEW = 1.0 34
GEN READ( ) $ Read a new observation at once 35
GEN XNEW = COPYV(XOLD1) $ XNEW = XOLD1 36
GEN XNEW = COPYV(REALN) $ XNEW = REALN 37
GEN GOTO( ) $ Go to GEN label ( ) at once 38
GEN XNEW = AND(XOLD1,XOLD2) $ If XOLD1=XOLD2=1.0 then XNEW = 1.0 39
GEN XNEW = OR(XOLD1,XOLD2) $ If XOLD1 or XOLD2 =1.0 XNEW = 1.0 40
GEN XNEW = DABS(XOLD) $ XNEW = ABS(XOLD) 41
GEN XNEW = NOT(XOLD) $ If XOLD=1.0 => XNEW=0.0, 42
If XOLD=0.0 => XNEW=1.0. If XOLD
is NE 0.0 & NE 1.0, XNEW = 1.D+32
GEN XNEW = DINT(XOLD) $ XNEW = integer part of XOLD 43
GEN XNEW= DELOBS() $ Deletes observation 44
GEN XNEW= DIFMISSING(XOLD) $ If XOLD= missing, obs is dropped 45
GEN XNEW = EQ(XOLD,REALN) $ If XOLD EQ REALN, XNEW = 1 46
GEN XNEW = EQ(XOLD1,XOLD2) $ If XOLD1 = XOLD2, XNEW = 1 47
GEN XNEW = KOUNT() $ XNEW = observation number 48
GEN XNEW = IKOUNT() $ XNEW = number of observation read 49
GEN KOUNTDEL(REALN) $ If KOUNT= REALN, obs deleted 50
GEN IKOUNTDEL(REALN) $ If IKOUNT = REALN, obs deleted 51
GEN BACKSPACE() $ If KOUNT NE NOOB, backspace 5 52
GEN XNEW = REC() $ Generates rectangular number 53
GEN XNEW = RECCS() $ Generates rectangular number 54
uses common seed. This is only
appropriate if the option
RECVER(RAND) is in effect
GEN XNEW = RN() $ Generates random normal number 57
having mean = 0.0 and SD = 1.
GEN XNEW= BUILDS(REALN1,REALN2)$ Builds a seasonal in XNEW from 1 -
REALN2, starting in REALN1. Maximum
value for REALN1 = 99. 59
GEN XNEW = DTAN(XOLD) $ XOLD LE (2**50) * pi 60
GEN XNEW = DARCOS(XOLD) $ ABS(XOLD) LE 1.0 61
GEN XNEW = DATAN(XOLD) $ XOLD = any real number 62
GEN XNEW = DATAN2(XOLD1,XOLD2) $ XOLD1, XOLD2 any real number not 0 63
GEN XNEW = DSINH(XOLD) $ XOLD LE 175.366, if X=XOLD 64
XNEW=(e**X - e**-x)/2
GEN XNEW = DCOSH(XOLD) $ XOLD LT 175.366, if X=XOLD 65
XNEW=(e**x - e**-x)/2
GEN XNEW = DTANH(XOLD) $ XOLD = any real number 66
GEN XNEW = DGAMMA(XOLD) $ 2**(-252) LE XOLD LE 2**(252) 67
integral from 0 to inf of
u**(XOLD-1)*e**(-u)du
GEN XNEW = DLGAMA(XOLD) $ log gamma function 68
GEN XNEW = DERF(XOLD) $ see FORTRAN manual 69
GEN XNEW = DERFC(XOLD) $ see FORTRAN manual 70
GEN XNEW = TIMESPI(XOLD) $ XNEW = XOLD * pi 71
GEN XNEW = PROBNORM(XOLD) $ XNEW = probability of normal 72
distribution. Can be
calculated as
.5+.5*DERF(XOLD/SQRT(2.0)
Note: The density of the normal distribution is calculated by
GEN DEN=DEXP(-1.0*(Z*Z)/2.0)/(DSQRT(TIMESPI(2.0)))$
or by use of the NORMDEN function.
GEN XNEW = DMAX1(XOLD1,XOLD2) $ XNEW = max of XOLD1 XOLD2 73
GEN XNEW = DMIN1(XOLD1,XOLD2) $ XNEW = min XOLD1 XOLD2 74
GEN XNEW = DMOD(XOLD1,XOLD2) $ XNEW = remainder of XOLD1/XOLD2 75
GEN XNEW = MISSING() $ XNEW = missing value 88
GEN XNEW = PROBIT(XOLD) $ XNEW = inverse normal of XOLD. 89
XOLD must be in range (0.0 1.0)
Implicit Date capability.
Assuming the user has set a base date on the OPTIONS card or in the DATA
paragraph using the functions SETYEAR, SETMY or SETDMY and SETFREQ, the
commands:
GEN JULDATE=CJULDAY()$ 76
GEN DAY=CDAY()$ 77
GEN MONTH=CMONTH()$ 78
GEN YEAR=CYEAR()$ 79
GEN QUARTER=CQT()$ 80
obtain the relative julian date information for that observation.
Explicit Date Capability.
Given data for the arguments, the date can be manipulated with the
commands
GEN JULDATE=JULDAYDMY(DAY,MONTH,YEAR)$ 81
GEN JULDATE=JULDAYQY(QUARTER,YEAR)$ 82
GEN JULDATE=JULDAYY(YEAR)$ 83
GEN DAY=GETDAY(JULDATE)$ 84
GEN MONTH=GETMONTH(JULDATE)$ 85
GEN YEAR=GETYEAR(JULDATE)$ 86
GEN QUARTER=GETQT(JULDATE)$ 87
GEN XNEW=FYEAR(JULDATE)$ Gets fraction of a year such as 1958.5 95
GEN XNEW=GETHOUR(JULDATE)$ 96
GEN XNEW=GETSECOND(JULDATE)$ 97
GEN XNEW=GETMINUTE(JULDATE)$ 98
GEN XNEW=FDAYHMS(HOUR,MINUTE,SECOND)$ Sets fraction of a day 111
The commands GETHOUR, GETMINUTE, GETSECOND and FDAYHMS
truncate HOUR, MINUTE and SECOND to integer values in
the ranges (0-24), (0-60) and (0-60) respectively.
B34S saves dates as integers. 12:00 noon on 31/9/1942 would be set as
GEN JULDATE=JULDAYDMY(31,9,1942)+.5 $
This could also be done as
GEN JULDATE = JULDAYDMY(31,9,1942) + FDAYHMS(12,0,0) $
The following commands can manipulate character variables.
GEN XNEW=CHTOREAL(CHAR)$ Converts a character number to a real #. 99
The function must be used with caution
since char must be a number.
GEN CHAR=REALTOCH(REAL)$ Converts a real number to character #. 108
The function must be used with caution
since REAL must fit inside 8 characters.
GEN CHXNEW=EXTRACT(CHARVAR,I,J)$ CHXNEW=CHARVAR(I:J) 100
I, J must be in range 1-8
GEN CHXNEW=PLACE(CHARVAR,I,J)$ CHXNEW(I:I+J-1)=CHARVAR(1:J-I+1) 101
I, J must be in range 1-8
GEN CHXNEW=MOVERIGHT(CHARVAR,N)$ CHARVAR moved right N 102
N must be LE 8
GEN CHXNEW=MOVELEFT(CHARVAR,N)$ CHARVAR moved left N 103
GEN CHXNEW=CHARDATE(JULIAN)$ Produces dd\mm\yy 104
GEN CHXNEW=CHARDATEMY(JULIAN)$ Produces mm\yyyy 122
GEN XNEW=IWEEK(julian); Produced 1=Monday etc 123
GEN CHWNEW=CWEEK(JULIAN); Produces 'Monday' etc 124
GEN CHXNEW=CHARTIME(JULIAN)$ Produces hh:mm:ss 105
GEN XNEW =FIND(CHARVAR,' ')$ Finds location of ' ' 106
GEN XNEW =NOTFIND(CHARVAR,' ')$ Location where ' ' not found 107
GEN CHXNEW=MASKADD(CHVAR1,CHVAR2)$ Places characters in CHXNEW only 109
if that col is blank in CHVAR1 and
there is a character in that col in
CHVAR2.
GEN CHXNEW=MASKSUB(CHVAR1,CHVAR2)$ Puts blanks in CHXNEW only where 110
non blank characters are in
CHVAR2.
The following example illustrates EXTRACT, PLACE, MASKADD, MASKSUB.
/$ Tests Character routines
b34sexec data noob=4 filef=@@$
input ch$
build ch2 ch3 ch4 ch3p4 chm2 find1 findnblk$
character ch ch2 ch3 ch4 ch3p4 chm2$
gen ch2 =extract(ch,3,4)$
gen ch3 =place(extract(ch,7,8),1,2)$
gen ch4 =place(ch,7,8)$
gen ch3p4 =maskadd(ch3,ch4)$
gen chm2 =masksub(ch,ch2)$
gen find1 =find(ch,'1')$
gen findnblk=notfind(ch,' ')$
datacards$
abcdefgh
12345678
87654321
hgfecba
b34sreturn$
b34seend$
b34sexec list$
var ch ch2 ch3 ch4$
b34seend$
b34sexec list$
var ch3p4 chm2 find1 findnblk$
b34seend$
Output from running this example follows:
Listing for observation 1 to observation 4.
OBS CH CH2 CH3 CH4
1 abcdefgh cd gh ab
2 12345678 34 78 12
3 87654321 65 21 87
4 hgfecba fe a hg
Listing for observation 1 to observation 4.
OBS CH3P4 CHM2 FIND1 FINDNBLK
1 gh ab cdefgh 0.00000000 1.0000000
2 78 12 345678 1.0000000 1.0000000
3 21 87 654321 8.0000000 1.0000000
4 a hg fecba 0.00000000 1.0000000
GEN DCALLkey() $ dynamically call TG routine 90-94
key must be DRANV, DRANW, DRANX,
DRANY, DRANZ
The maximum LAG__ is 99.
The command GEN PASS(' ') passes text to
the B34S low level command parser without parsing.
To assist FORTRAN 77 users, the following aliases are
provided: DSQRT=SQRT, DLOG=LOG, DLOG10=LOG10, DEXP=EXP,
DASIN=ASIN, DCOS=COS, DABS=ABS, SIN=DSIN, ABS=DABS.
Statistical functions in B34S.
GEN XNEW = PROBIT(XOLD) $ XNEW = inverse normal of XOLD. 89
XOLD must be in range (0.0 1.0)
GEN XNEW = PROBNORM(XOLD) $ XNEW = probability of normal 72
distribution. Can be
calculated as
.5+.5*DERF(XOLD/SQRT(2.0)
GEN XNEW = NORMDEN(XOLD) $ XNEW = densitity of normal
distribution. XNEW =
exp(-1.*z*z/2)/dsqrt(2*pi)
GEN XNEW = BETAPROB(xold1,xold2,xold3) $ Computes probability XNEW 113
that a variable having a beta
distribution having parameters xold2
and xold3 is LE xold1. xold2 and
xold3 must be GT 0
GEN XNEW = INVBETA(xold1,xold2,xold3) $ Inverse of beta distribution 114
xold1 is probability.
GEN XNEW = CHISQPROB(xold1,xold2) $ xnew is probability that xold1 115
having chi-squared distribution
with degress of freedom xold2 is
le xold1. .5 LE xold2 200000
xold1 ge 0.0
GEN XNEW = INVCHISQ(xold1,xold2) $ Inverse chi-squared. 116
0 le xold1 le 1.0.
.5 le xold2 200000
GEN XNEW = NCCHISQ(xold1,xold2,xold3)$ Non central chi-square 121
xold1 variable GE 0 for which to
calculate probability
xold2 degress of freedom (GE .5)
xold3 non centratity
.5 LE (xold2+xold3) LE 200000
GEN XNEW = FPROB(xold1,xold2,xold3) $ F distribution probability 117
xold1 = F value (GE 0)
xold2 = df numerator (GT 0)
xold3 = df denominator (GT 0)
GEN XNEW = INVFDIS(xold1,xold2,xold3)$ Inverse F distribution 118
xold1 = probability in range 0 1)
xold2 = df numerator (GT 0)
xold3 = df denominator (GT 0)
GEN XNEW = TPROB(xold1,xold2)$ Probability of t distribution 119
xold1 = t value
xold2 = df (GT 0)
if xold1 = 1.966 and xold2 = 10000 we get
.95
GEN XNEW = INVTDIS(xold1,xold2)$ Inverse t distribution 120
xold1 = probability
xold2 = df (GT 0)
Note: At low levels of probability, the INVFDIS command may fail due
to overflows.
13.3 Notes on Analytic statements
In addition to the function driven GEN statements, B34S also allows
analytic GEN statements. Since analytic GEN statements are expanded by
the B34S parser, there is no longer a 1 to 1 correspondance between GEN
statements and the underlying TG statements. Hence the GEN statements
UP, BACKWARD, DOWN or FORWARD may not work as expected if any analytic
statements are within the jump range. Analytic GEN statements parse
slower but are easier to use. The operators +, -, *, / and ** are
allowed. In addition IF-THEN structures are allowed. Examples of
analytic statements are given below. All variables must have
been mentioned on an INPUT or BUILD sentence. Statements are evaluated
from left to right. Expansion first takes place inside ( ). A limited
number of B34S functions are allowed inside the analytic statement.
Function arguments themselves can be analytic statements.
gen x=x*x$
gen y=(x+2.)/(j + kkk)$
gen vv = xx - jj + kk $
gen x =(x-lag(x))/lag(x) $
gen x =sin(y*q)/2.0 $
gen test5=log(exp(log(exp(5.0))))$
Note: Caution for users if CALMATH is in effect.
Users are cautioned to make use of ( ) to be sure that an analytic
statement is evaluated as desired. For example use
x = ( y * (p + q))$
type statements. The order of B34S doing evalutions is to first make
all constants temp variables of the form ######__ where __ goes from
1-40. Next any functions such as SIN are evaluated in place. Next the
lowest ( ) is found and evaluated. After all ( ) are reduced, the
expression is evaluated from left to right in a manner similar to a
simple hand-held calculator. This convention differs from the FORTRAN
convention that first evaluates **, then * and /, and finally + and -.
Users are cautioned to carefully check the results of all analytic
calculations by listing the data with the B34SEXEC LIST option (possible
with IEND=10 if there are too many observations), before proceeding to
the analysis. The user can inspect the STACK during expansion of a GEN
sentence by placing the following line first in the control file
b34sexec options debugsubs(pstack)$ b34seend$
Examples of statements where unintended results can occur are given
next. Using the B34S CALMATH option, the expression
gen a=17+16*3**2$
results in a=9801. Using ( ) the expression can be written as
gen a9801=(((17+16)*3)**2)$
which will resolve to 9801 no matter whether CALMATH or FORTMATH is in
effect. Note that with CALMATH in effect B34S evaluates the expression
as if it was entered into a calculator from left to right. This is in
contrast to FORTRAN which would evaluate the expression as if it were
written
gen a = 17+(16*(3**2))$
and obtain a=161.
The expression
gen b = 12-3*4/2$
using the FORTMATH convention would be seen as if it were written
gen b = 12-((3*4)/2)$
and would find b=6 while B34S using the CALMATH option would evaluate
gen b = (12-3)*4/2$
and would find b=18.
A good rule to follow is use ( ) so that it is clear what is desired.
While many "old style" programs follow the FORTRAN approach, the
developer of B34S has found than many modern users who do not have a
programing background obtain unintended results using the FORTRAN
programing convention. As a result B34S was designed to allow the
simpler "calculator" approach.
Users have told the developer that it is best to force the FORTRAN
convention on users. This suggestion has been taken. As of now so as not
to confuse FORTRAN users, the FORTMATH option is the default for all GEN
statements. This option forces the B34S GEN statement parser to resolve
from left to right, first replacing FUNCTIONS, next replacing ( )
expressions, next replacing ** expressions, next replacing * and /
expressions and finally replacing + and - expressions. The B34S MATRIX
command follows the FORTRAN convention 100%. Just the discussion of the
differences between the CALMATH and the FORTMATH option alerts users to
how an expression is parsed.
All power functions require a positive base. For example if X=-2.0
Y = X**Q is not allowed. The reason for this "harsh" limit is while
-2**3 = -8, -2**3.00000000000001 enters the complex domain. If the user
knows that the exponent will be an integer, the code
gen y=abs(x)**3$
gen if(x.lt.0.0)y=-1*y$
or in the general case
gen y=abs(x)**e$
gen if(x.lt.0.0.and.dint(e/2).ne.(e/2))y=-1*y$
will give the desired result.
Multiple operations.
GEN statements should resolve to be in the form of an assingment
gen x = value;
after an analytic statement such
gen x = y*z;
which is seen as
" variable operator variable "
A statement of the form
gen x = -y+x;
will not work as intended since it is seen as
"operator variable operator variable"
The correct form is
gen x = -1. * y + x;
For example the statements
b34sexec options ginclude('gas.b34'); b34srun;
b34sexec data set;
build yy;
gen yy=-gasout*gasin;
b34srun;
Produces the following error message after gasout*gasin was resolved to
a temp variable:
Note: FORTRAN calculation hierarchy used in GEN sentence.
ERROR: Argument error to analytic expression.
Check for multiple operators or = signs.
Problem was with:
= - ######01
Examples of GEN statements
User wants to delete all observations where X=4.0, Y=10.0 and Z=6.0
gen if(x.eq.4.0.or.y.eq.10.0.or.z.eq.6.0)x=delobs()$
User wants to delete all observations were X is missing.
gen x=difmissing(x)$
User wants to build lag variables. On DATA step set MAXLAG=n
where n is the maximum lag.
gen xlag1=lag(x)$
gen xlag9=lag9(x)$
If the user knows in advance that all X variables are positive, the
below listed code will delete the appropriate number of observations
since it makes use of the fact that all variables are initialized
to zero by B34S and a lag before the start of the data will be set
to zero. The user will be warned if this type of approach is used.
The developer recommends that the MAXLAG approach be used since
X may actually have zero values.
gen xlag1=lag(x)$
gen xlag9=lag9(x)$
gen del(xlag9)$
The statement GEN LX = LAGn(X)$ places the nth lag of X in LX.
B34S uses the current value of X at that point as does SAS. Usually
this is not a problem. The below listed code illustrates what occurs.
b34sexec data noob=10 maxlag=1 head $
build x lagx lagx2$
gen x=kount() $
gen lagx = lag(x)$
gen x=x+100$
gen lagx2= lag(x)$
b34srun$
b34sexec list$ b34srun $
Produces output:
OBS X LAGX LAGX2
1 101.00000 0.00000000 100.00000
2 102.00000 1.0000000 101.00000
3 103.00000 2.0000000 102.00000
4 104.00000 3.0000000 103.00000
5 105.00000 4.0000000 104.00000
6 106.00000 5.0000000 105.00000
7 107.00000 6.0000000 106.00000
8 108.00000 7.0000000 107.00000
9 109.00000 8.0000000 108.00000
13.4 User Control of Data Building supported in MVA and CMS only.
The GEN Function DCALL allows the user to customize B34S to perform
specialized data building functions available in FORTRAN. DCALL
allows a dynamic branch. All routines have the same argument list.
Below is an example where variables 1 - 10 are divided by SQRT
of variable 11.
MVS SETUP
/*jobparm r=4096,t=1
// exec fortvce
//fort.sysin dd *
subroutine dranv(dummy,ikount,kkount,kount,noob,obno,onnb,iflag,
*inew,kode,ivar,bdex,mm,ilpmx,iotest,idbug)
real*8 dummy(99),obno,onnb
c
do 100 i=1,10
100 dummy(i) = dummy(i) / dsqrt(dummy(11)
return
end
//lked.syslib dd dsn=bec4346.#b34s.vsload,disp=shr,label=(,,,in)
// dd dsn=sys1.vsfort.vlnkmlib.ver23,disp=shr
// dd dsn=sys1.vsfort.vfortlib.ver23,disp=shr
//lked.syslmod dd dsn=&&lib2(dranv),disp=(mod,pass),
// space=(trk,(2,2,2)),unit=scratch
// exec b34s
//go.b34slib dd dsn=&&lib2,disp=(old,delete)
//go.sysin dd *
b34sexec data unit=10$
input x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11$
gen dcalldranv() $
b34seend$
CMS SETUP
Step one: Compile DRANV FORTRAN. At UIC this is done with command:
FORTVCE DRANV (OPT(3)
Step two: Allocate file. At UIC this is done with command:
FILEDEF B34SLIB DISK DRANV MODULE
Step three: Run B34S the usual way with B34S CMS EXEC.
Discussion of arguments
DUMMY = array of size 99 containing current observation.
IKOUNT = total number of observations read so far
KKOUNT = IKOUNT - obs deleted by IFLAG
KOUNT = KKOUNT - number deleted by MAXLAG
NOOB = number of observations
OBNO = REAL*8 version of NOOB
ONNB = OBNO - 1.0
IFLAG = set = 0 to keep observation, set = 1 to delete
IDBUG = set to 0 usually. Can be set by DEBUG
At the present time the DCALL feature works only on CMS and MVS. For
complex data transformations it is suggested that the data be moved
to the matrix command for further processing and then the series moved
back. Another possibility is to use SAS to build the data. For large
cross section work, the SAS option is often used. For complex time
series applications, the MATRIX command approach may be the way to
proceed.
13.5 Random Number Generation and LP Capability
The FUNCTION BACKSPACE allows B34S to generate NOOB random numbers by
passing only one data card and building NOOB data points. For example:
b34sexec data noob=200$
input rr $ build rn1 rn2 $
gen backspace()$
gen rr = rec() $
gen rn1 = rn() $
gen rn2 = rn() $
datacards$
1111.0
b34sreturn$
b34seend$
will generate 200 observations where RR is random rectangular variable
and RN1 and RN2 are random normal variables.
In most cases, reading only one observation over and over does
not gain the user anything. A better way to proceed is to use a
job with only a BUILD sentence. This will run faster. An example is
given below:
b34sexec data noob=200$
build rr rn1 rn2 y$
gen rr = rec() $
gen rn1 = rn() $
gen rn2 = rn() $
gen y = 100.0 + (.5 * rr) + (.3 * rn1) + rn2$
b34seend$
In the above job, RR is a rectangularly distributed variable in the
range 0.0 - 1.0, while RN1 and RN2 are random normal variables with mean
0.0 and sd 1.0.
The default is to have the REC and RN commands set to call
GGUBS and GGNML which were obtained from the 8th edition of IMSL. If
other routines are desired, they can be set using the OPTIONS commands
RECVER and RNVER.
The current autoexec.b34 file resets these to IMSL_1 and DRNNOA.
For example:
RECVER(GGUBS) => IMSL routine GGUBS, the default.
RECVER(RAND) => the old IBM RAND routine.
RECVER(RAN1) => Numerical Recipies RAN1 (see page 196).
RECVER(RAN2) => Numerical Recipies RAN2 (see page 197).
RECVER(RAN3) => Numerical Recipies RAN3 (see page 199).
RECVER(FORT90) => Fortran 90 Random number generator
RECVER(IMSL_1) => IMSL Version 10 16807 Generator
RECVER(IMSL_2) => IMSL Version 10 16807 Generator Shuffled
RECVER(IMSL_3) => IMSL Version 10 397204094 Generator
RECVER(IMSL_4) => IMSL Version 10 397204094 Generator Shuffled
RECVER(IMSL_5) => IMSL Version 10 960706376 Generator
RECVER(IMSL_6) => IMSL Version 10 960706376 Generator Shuffled
RECVER(IMSL_7) => IMSL Version 10 Recursion option
RNVER(GGNML) => IMSL GGNML routine, the default.
RNVER(GRAND) => GRAND routine.
RNVER(GASDEV) => Numerical Recipies GASDEV (page 203).
RNVER(GASDEV2) => Numerical Recipies GASDEV2 (page 203).
RNVER(GASDEV3) => Numerical Recipies GASDEV3 (page 203).
RNVER(GASDEV4) => FORT90 random number generator and GASDEV.
RNVER(DRNNOA) => IMSL-10 Acceptance/rejection generator
RNVER(DRNNOR) => IMSL-10 Inverse CDF Generator
The GASDEV2 and GASDEV3 routines are modifications of GASDEV to call
RAN2 and RAN3 respectively. The RAN1 is probably the best of the three.
RAN2 is fast but has the limit that it produces one of only 714025
possible values. RAN3 is a "portable" routine. In Monti Carlo work,
the possibility of experimenting with different generators and different
seeds is important. The OPTIONS command SETSEED allows different
starting values to be used. For further discussion of some of the
considerations in the selection of a random generator, see Numerical
Recipes by Press, Flannery, Teukolsky, Vetterling Cambridge University
Press 1989 chapter 7.
The B34S MATRIX Command allows use of these generators. If RECVER
and RNVER are set globally, they will also be set for the MATRIX facility.
However local changes can be made to the generators used in MATRIX command
programs. The default settings are GGUBS and GGNML which should be
logically the same as the IMSL_1 and DRNNOR settings. The GGUBS and
GGNML programs are available on all platforms. Extensive documentation
for the random number generators is given in the IMSL documentation.
The MATRIX command has been designed with serious numerical Monti
Carlo calculations in mind.
Note: There is a limit of 999 GEN statements that do not use LAG or
LP functions. The max number of LP and LAG functions is 150.
Linear processing with LP function.
B34S allows user to calculate a complex linear process of the form
XNEW = AR(XNEW) + MA(XOLD) where AR is an autoregressive
weighting function starting with lag 1 and MA is a moving average
weighting function starting with lag 0. The general form of the command
is
gen xnew = lp(nar,nma,xold) $
where NAR = number of AR terms (max 10),
NMA = number of MA terms (max 10),
XOLD= moving average variable
Options and parameters for GEN LP function include
GEOMETRIC - generate geometric MA weights
PASCAL - to generate pascal weights
LAMDA=r1 - sets lamda weight for geometric and pascal.
Default = .5. Maximum of 3 digits allowed.
IORDER=n1 - Order for pascal weights. Default = 2.
JUMP=n2 - # of weights truncated off left hand side.
OVCON=n3 - Overflow index for AR system. The output is
test against 10**n3. If more than 10 overflows
occure, the problem is stopped. OVCON defaults to
10.
AR=(A1,A2) - Inputs AR weights if NAR ne 0.
MA=(M1,M2) - Inputs MA weights if NMA ne 0 and GEOMETRIC and
PASCAL not set.
VALUES(val1,val2) - inputs initial values if NAR ne 0.
Example 1. Use 3 order PASCAL weights for 7 terms with lamda = .4.
gen xnew=lp(0,7, xold) pascal iorder=3 lamda=.4 $
Example 2. Use 4 AR and 2 MA weights.
gen xnew=lp(4,2,xold) ar(.4,.3,.2,.1) ma(.5,.5) ovcon=100
values=(100.0,98.0,97.0,100.0) $
Usage note: The GEOMETRIC and PASCAL options assign MA=0.0 unless the
JUMP sentence is used.
Example 3. Uses LP command to generate Box-Jenkins Models
NORM = a random series
NORMD1 = an AR model having form (1.0 + .5B)x(t) = e(t)
AR3 = an AR model having form (1.0 -.9B +.4B**2 +.3B**3)x(t)=e(t)
MA3 = an MA model having form x(t)=(1.0-.7B+.8B**2+.4B**3)e(t)
ARMA = an ARMA model having form
(1.0-.5B)x(t)=(1.0-.7B+.8B**2+.4B**3)e(t)
/$$ tests bjest
b34sexec data noob=3000 nohead heading('random and first diff')
maxlag=3$
build norm normd1 ar3 ma3 arma$
gen norm=rn()$
gen normd1=norm-lag1(norm)$
gen ar3=lp(3,1,norm) ar(.9 -.4 -.3) ma(1.) values(0.0 0.0 0.0)$
gen ma3 =lp(0,4,norm) ma(1. -.7 .8 .4) values(0.0)$
gen arma=lp(1,3,norm) ar(.5) ma(1. -.7 .8) values(0.0)$
b34srun$
b34sexec bjiden$
var norm normd1 ar3 ma3 arma$
seriesn var=norm name='white noise'$
seriesn var=normd1 name='white noise differenced'$
seriesn var=ar3 name='ar(3) '$
seriesn var=ma3 name='ma(3) '$
seriesn var=arma name='arma(1,2) '$
rauto norm normd1 ar3 ma3 arma$
b34srun$
b34sexec bjest$
model normd1$
modeln p=1 avepa=.1$
forecast nf=10 nt=2995$ b34srun$
b34sexec bjest$
model ar3 $
modeln p=(1,2,3) avepa=.1$
forecast nf=10 nt=2995$ b34srun$
b34sexec bjest$
model ma3 $
modeln q=(1,2,3) avepa=.1$
forecast nf=10 nt=2995$ b34srun$
b34sexec bjest$
model arma $
modeln p=1 avepa=.1 q=(1,2)$
forecast nf=10 nt=2995$ b34srun$
13.6 Examples of DATA Paragraph
The simplest way to input 5 observations on 3 series is:
b34sexec data$
input x y z$
datacards$
1 11 111
2 22 222
3 33 333
4 44 444
5 55 555
b34sreturn$
b34seend$
The above example could be written as
b34sexec data$
input x y z$
datacards$
1 11 111 2 22 222 3 33 333 4 44 444 5 55 555
b34sreturn$
b34seend$
since if NOOB is NOT set the default reading option is FILEF=@@.
If NOOB is set explicitly, FILEF=FREE by default. This will read
faster.
b34sexec data noob=5$
input x y z$
datacards$
1 11 111
2 22 222
3 33 333
4 44 444
5 55 555
b34sreturn$
b34seend$
If NOOB is set and the data is all on one row or broken up, then
FILEF=@@ must be set.
b34sexec data noob=5 filef=@@$
input x y z$
datacards$
1 11 111 2 22 222 3 33 333 4 44 444 5 55 555
b34sreturn$
b34seend$
It is to be noted that FILEF=@@ is the most flexible reading format but
it also is the slowest. FILEF=FREE is faster, but is slower than
FILEF=FIXED. FILEF=DP is the fastest.
Column loading of data (filef=cfixed) is illustrated next
b34sexec data filef=cfixed noob=3;
input x(1,1,3) y(1,4,4);
datacards;
1234
4321
9998
b34sreturn;
b34srun;
b34sexec list; b34srun;
b34sexec data filef=cfixed;
input x(1,1,3) y(1,4,4);
datacards;
1234
4321
7778
b34sreturn;
b34srun;
b34sexec list; b34srun;
b34sexec data filef=cfixed;
input x(1,1,3) y(1,4,6);
datacards;
1234
4321
999 .
b34sreturn;
b34srun;
b34sexec list; b34srun;
The next example shows data building.
b34sexec data noob=5$
input x1 y1 z $
build sumy1z $
gen sumy1z = y1+z$
datacards$
11 22 33
33 44 33
333 3.0 666
22 33 11
11.2 33.4 22.2
b34sreturn$
b34seend$
b34sexec regression$
model x1 = sumy1z$
b34srun$
/$
/$ we reload the data and build some more data.
/$ note that prior step ends with b34srun$, not b34seend$.
/$ optional labels have been supplied.
/$
b34sexec data set$
build x1ty1 x1py1 xx$
label x1ty1 = 'x1 * y1 '$
label x1py1 = 'x1 + y1 '$
label xx = '(x1**3)/(sin(x1)+(2.0*cos(x1/y1)))'$
gen x1ty1 = x1*y1$
gen x1py1 = x1+y1$
gen xx = (x1**3)/(sin(x1)+(2.0*cos(x1/y1)))$
b34seend$
/$
/$ we list the data
/$
b34sexec list$ b34srun$
Data loading from a file on PC
Assume X1,...,X6 are on a file MYDATA.DAT. The below listed statements
will load the series and perform a regression after building data. Since
there is no FILEF= parameter, FREE is assumed. If more than one
observation is placed on one card, use FILEF=@@.
b34sexec options open('mydata.dat') unit=10 disp=old$ b34seend$
b34sexec data unit(10)$
input x1 x2 x3 x4 x4 x6$ build x1sq x1tx2$
gen x1sq =x1*x1$
gen x1tx2 =x1*x2$
b34seend$
b34sexec regression$ model x6=x1 x2 x3 x4 x5 x1sq $ b34seend$
The same job can be coded with the file statement as:
b34sexec data file('mydata.dat')$
input x1 x2 x3 x4 x4 x6$ build x1sq x1tx2$
gen x1sq =x1*x1$
gen x1tx2 =x1*x2$
b34seend$
b34sexec regression$ model x6=x1 x2 x3 x4 x5 x1sq $ b34seend$
13.7 Reading DMF Files
The B34S DATA pragraph can be used to load data from a B34S DMF file.
Section 40.0 discusses how to create and maintain B34S DMF files. Since
B34S has a limit of 98 variables and one constant, the DMF data library
is provided as a means by which to store large numbers of data series
and selectively read series into B34S. The current maximum number of
series in a DMF file is 9999 although this can change in future
releases. There can be multiple members in a DMF library. Members
are selected using the DMFMEMBER( ) parameters on the DATA sentence.
If DMFMEMBER( ) is not specified, the first member is read.
Assume that there are 880 series in B34S DMF file MYDATA.DMF. It is
desired to load series X, Y, Z from the first member. The following
commands will load the series:
b34sexec options open('c:\mysd\mydata.dmf') unit(60)$ b34seend$
b34sexec data filef=dmf unit(60)$
input x y z$
b34seend$
If MYDATA.DMF was a formatted DMF file, the correct commands would be:
b34sexec options open('c:\mysd\mydata.dmf') unit(60)$ b34seend$
b34sexec data filef=fdmf unit(60)$
input x y z$
b34seend$
Note: As of version 7.12g, b34s has been modified to detect whether the
file is formatted or unformatted. Hence FILEF=DMF or FILEF=FDMF
can be used.
If member CRIME was to be loaded the above two examples would be:
b34sexec options open('c:\mysd\mydata.dmf') unit(60)$ b34seend$
b34sexec data filef=dmf unit(60) dmfmember(crime)$
input x y z$
b34seend$
If MYDATA.DMF was a formatted DMF file, the correct commands would be:
b34sexec options open('c:\mysd\mydata.dmf') unit(60)$ b34seend$
b34sexec data filef=fdmf unit(60) dmfmember(crime)$
input x y z$
b34seend$
In place of the open statements, the following can be used:
b34sexec data filef=fdmf file('c:\mysd\mydata.dmf')
dmfmember(crime)$
input x y z$
b34seend$
If MYDATA.DMF contained 80 series and the user wanted to load all 80
series without using an INPUT statement, the correct setup would be
just to omit the IMPUT statement. If the DMF file contained more than
98 series and the INPUT statement was omitted, only the first 98 series
would be read.
The IBEGIN and IEND options on the DATA sentence control whether all
observations from the DMF file are read. Assume that the MYDATA.DMF file
contains 2000 observations but the user wants to load only from
observation 23 to observation 1023. The correct commands would be
b34sexec options open('c:\mysd\mydata.dmf') unit(60)$ b34seend$
b34sexec data filef=dmf unit(60) dmfmember(crime)
ibegin=23 iend=1023$
input x y z$
b34seend$
assuming the member name was CRIME and only X Y Z was desired.
13.8 Data Bank Options
Note: The NBER data bank option is rarely used these days. It might
be removed from future B34S versions. The DMF facility has
replaced the capability in the BANK command.
BANK sentence options.
READ - Sets up to read from NBER Data Bank. This is the
default.
MERGE - Sets up to merge data from one bank with data from
another bank.
LISTB - Will list data bank on unit IBNKU and documentation on
unit IDOCU.
NODOC - Will suppress documentation listing. This option is not
recommended. The default is to give documentation.
BANK sentence parameters.
IBNKU=n1 Sets bank unit for data. Default = 19.
IDOCU=n2 Sets documentation bank unit. Default = 20.
IMDU =n3 Sets merge bank unit. Default = 24.
IDOCM=n4 Sets merge documentation unit. Default = 20
IVARC=n5 Sets number of variables read from main bank for a bank
read. If omitted, it defaults to NVAR. If the MERGE option
is specified. Need to set IVARC LT NVAR. (NVAR-IVARC)
series will be read from the merge bank.
INDXB=n6 Sets first bank index number to read/write if BI parameter
not used. It is recommended that BI parameter be used.
ISKP =n7 Sets number of observations to skip before reading data in
bank. The NBER data banks at UIC start in 1960.
IY1 =n8 Sets last two digits of initial year. Default = 1.
IP1=n9 Sets frequency per year. Default = 1.
IB1=n10 Sets initial period of data. Default = 1.
IMBCRC=n11 Sets number of records in merge bank. Defaults if
IMKU = 19, 24 or 25.
IMDCRC=n12 Sets number of records in merge documentation. Will
default if IMDU = 20, 26, 23, 27.
IPERD=n13 Sets number of observations per period of merge data.
IMSKP=n14 Sets number of observations to skip for merge data. If
both data banks start in same period,
IMSKP = IPERD * ISKP.
BI=(n1,n2) Sets index numbers of data bank.
MI=(m1,m2) Sets index numbers of merge data bank.
The below listed example lists the monthly, quarterly and yearly banks.
It illustrates all JCL needed at UIC.
/*jobparm t=(,30),l=5,r=2048
// exec b34s
//go.ft24f001 dd dsn=bec4346.#nber.mo.data,disp=shr,label=(,,,in)
//go.ft20f001 dd dsn=bec4346.#nber.mo.docm,disp=shr,label=(,,,in)
//sysin dd *
b34sexec data$
* list monthly bank $
bank listb ibnku = 24 idocu = 20$
b34seend$
// exec b34s
//go.ft19f001 dd dsn=bec4346.#nber.qt.data,disp=shr,label=(,,,in)
//go.ft23f001 dd dsn=bec4346.#nber.qt.docm,disp=shr,label=(,,,in)
//sysin dd *
b34sexec data$
* list qt bank $
bank listb ibnku = 19 idocu = 23$
b34seend$
// exec b34s
//go.ft25f001 dd dsn=bec4346.#nber.yr.data,disp=shr,label=(,,,in)
//go.ft27f001 ff dsn=bec4346.#nber.yr.docm,disp=shr,label=(,,,in)
//sysin dd *
b34sexec data$
* list yr bank $
bank listb ibnku = 25 idocu = 27$
b34seend $
The following example reads series 10 20 30 from one quarterly bank and
10 11 16 from another monthly bank. The user knows the names of these
series and on the reread renames them X1 X2 X3 XX1 XX2 XX3.
b34sexec data noob=100 noconstant nvar=6 ivarc=3 $
bank merge iperd=3 ibmku=19 idocu=23 iskp=8
bi=(10 20 30)
mi=(10,11,16)
imku=24 imdu=20 imskp=24 $
b34seend$
b34sexec data unit=8 filef=dp$
input x1 x2 x3 xx1 xx2 xx3 $
b34seend $