Substituting One Value for Another in a Group of Variables - SAS Programming

Suppose you have 105 variables (X1-X100, A,B,C,D, and E)in SAS data set OLD,and a value of 999 is used to represent missing data.It is a common practice in some other database systems to use values such as 99, 999, etc.,to represent missing values. Suppose further that you want to substitute a SAS System missing value (.) for the value of 999. You can proceed as follows:

Example - Hard Way

DATA HARDWAY;
SET OLD;
IF Xl=999 THEN Xl=.;
IF X2=999 THEN X2=.;
IF X100«999 THEN X100*.;
IF A =999 THEN A = .;
IF B =999 THEN B = ,;
IF E =999 then E = .;
RUN;

Pretty tedious eh? You say, "There must be a better way!" And there is—it's called an array. An array is used in the SAS System to represent a list of variables, or elements.The SAS System then allows you to perform an operation, or a set of operations, on the entire list by referring to the array. Using an array, the above program can be rewritten as follows:

Example - Easy Way

DATA EASYWAY;
SET OLD;
ARRAY NARNIA[105] X1-X100 A B C D E;
DO 1=1 TO 105;
IF NARNIA[IJ=999 THEN NARNIA[I]=,;
END;
DROP I;
RUN;

Here you create an array with the ARRAY statement, name it NARNIA,and define it to represent all 105 variables.(You know where this array name came from if you have young children, and if you've read the C.S. Lewis collection, "The Chronicles of Narnia" to them.) By placing the IF statement in a DO loop,and operating on the array NARNIA, which represents 105 variables,you can accomplish with one statement exactly the same objective as the first

To repeat ourselves (we do that often because repetition is a cornerstone of good teaching), you simply assign the variables in question to the array, write one of the repetitive lines using the subscripted array name in place of a variable name, and place the code in a DO loop so that it will execute as many times as there are variables in the array. Notice that you should drop the DO loop index (I) from the data set you create. You really don't need to keep it around after it has served its purpose. This is good coding practice in general.

In addition to using an array to represent an entire group of variables, you can refer to any one of the variables individually by using a bracketed subscript, ([I] in this example).Thus, NARNIA[3] refers to the variable X3, NARNIA[104] refers to the variable D, etc. The time to start using arrays is when you notice that you are coding the same line over and over again, with the only change being the variable name.An alternate way to code the above ARRAY statement is:

ARRAY NARNIA[*] Xl-XlOO A B C D E;

The * indicates that the SAS System will count the number of variables in the array for you. Of course there are always ways for you to get the count yourself, but here you just take the easy way. The DO loop is:

DO I = 1 TO DIM(NARNIA);

DIM is the dimension function.This returns the length of the array (i.e. the number of variables in the list). Don't worry about the extra CPU time it takes to use this method—it is negligible.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics