Using a LENGTH Statement - SAS Programming

Even if you have a large amount of storage space,it is prudent to keep your SAS data sets as small as possible. This minimizes your storage needs, makes your backups (you DO back up your files, don't you?) run faster,and finally, reduces the execution time of your programs.n a SAS system file, character and numeric variables are stored quite differently.It's not necessary to go into too many technical details (well, maybe a few) to make some strong recommendations.

The length of a character variable should always be set to the maximum number of characters you need to store all values of the variable. For example, if you store GENDER as 'M' or 'F', you only need one byte of storage. If you input this variable with a column or pointer-format combination, and read from only one column, the length is automatically set to 1. However, if you use list-directed input like this:

INPUT ID GENDER $ HEIGHT WEIGHT ... ;

then the length of GENDER is set, by default, to 8 bytes.When you do this, you are using eight times as much storage as you need for this variable.If you are not careful,your SAS data set may be many times larger than necessary.To solve this problem, use a LENGTH statement to set the length of the variable before you write your INPUT statement.You cannot change the lengths of variables that already exist in data sets. The efficient code is:

LENGTH GENDER $ 1; INPUT ID GENDER $ HEIGHT WEIGHT ... ;

A $(dollar sign) after a variable name (e.g., GENDER $), indicates that the variable is a character variable. The 1 indicates the length for this variable. If you have several variables,all with the same length, you can list them together like this:

LENGTH GENDER RACE INSURED $ 1;

It's a good idea to specify the length of all character variables in a SAS program as they are created.This can be done using various statements: LENGTH, INPUT, RETAIN, ARRAY. Numeric variables are more complicated.The default length for a SAS numeric variable is 8 bytes.

This does not mean 8 significant figures; it means that 8 times 8, or 64 bits (8 bits per byte) are used to store the number. Numbers in a SAS program are stored the same way as in many other programming languages such as FORTRAN (remember that?), PL/1, BASIC, or C. The number is made up of a sign bit, a base (also called a mantissa), and an exponent, all of which are stored.Eight bytes is equivalent to what used to be called "double-precision" in other languages.

This will vary not only by which computer language you are using, but on which computer and under which operating system you are running. There is considerable controversy concerning the appropriate length of SAS numeric variables. All numeric variables are expanded to 8 bytes in memory and in all DATA and PROC steps. If you store a SAS numeric variable in less than 8 bytes, you lose precision. Be especially careful of statements such as:

IF X=l/3 THEN ...;

when you have stored X in less than full precision (8 bytes).Also, be aware that certain statistical procedures (such as multiple regression) may be sensitive to loss of precision.For many purposes,reducing the length of numeric variables to 4 should not cause you any trouble or concern. Suffice it to say that reducing any numeric variable to less than 8 bytes requires a bit of care and knowledge.

There is a DEFAULT=n option available for the LENGTH statement which sets the length for all subsequent numeric variables being created. If you are using codes to store information,use character variables rather than numerics.The numbers 0 and 1 take 8 bytes by default; the characters "0" and "1" can easily be read in to use only 1 byte. Use character variables when arithmetic will not be performed on the values. Here is an example incorporating some of these concepts. First the inefficient:

Example – INEFFICIENT

DATA LONG;
INPUT ID 1-3
@4 (Q1-Q10) (1.)
@15 HEIGHT 2.
@17 WEIGHT 3.;
DATALINES;
;

Data set LONG is storing all the variables as numerics at 8 bytes apiece.The total storage length is 13*8 = 104 bytes. By using a LENGTH statement to reduce the precision for HEIGHT and WEIGHT (probably OK to do),and storing variables ID and Q1-Q10 as character, you can reduce the storage to 21 bytes, almost a five-fold reduction.

Example – EFFICIENT

DATA SHORT;
LENGTH HEIGHT WEIGHT 4;
INPUT ID $ 1-3
@4 (Q1-Q10) {$1.)
@15 HEIGHT 2.
§17 WEIGHT 3.;
DATALINES;
;

Note that you do not need a LENGTH statement for variables ID and Q1-Q10 since their length is defined in the INPUT statement.However, without a LENGTH statement to specify a length of 4 bytes for HEIGHT and WEIGHT, the default length of 8 would be assigned.Remember that the number of columns from which you read a numeric variable has nothing to do with theinternal length or number of bytes used to store it.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics