Restructuring a SAS Data Set: Creating Multiple Observations from a Single Observation - SAS Programming

p>When you collect multiple measurements on a subject at different times or under different conditions, you have a choice of how to structure your data set. For example,if you measure X at times 1, 2, and 3, you can choose to have a single observation containing SUBJECT, XI, X2, and X3, where X1 is the value of X at time 1, etc.Or,you can decide to make three observations with SUBJECT, TIME, and X.

Restructuring a SAS Data Set: Creating Multiple Observations from a Single Observation

For some analyses the first structure is more convenient; for others,the second structure is preferred. In this example, we show you how to convert from the first structure to the second. Suppose you have SAS data set OLD with one observation per subject containing values for three variables, X1 - X3, which represent measurements taken at three different times.

You wish to create a SAS data set NEW with three observations per subject (one for each measurement), and a variable TIME denoting the measurement (1, 2, or 3) as follows:

TIME denoting the measurement

This can be easily accomplished using an array as follows:

ARRAY XX{3] X1-X3; ©
DO TIME=i TO 3? ©
DROP X1-X3; 0

There are a few new things to notice in this program. First, you cannot use the same name for an array and a variable in the same DATA step.In this example you use XX for the array name, so X will be available for the name of the newly created single variable in data set NEW. Next, you create another new variable TIME to identify which measurement time each observation in data set NEW represents.

You can use a shortcut and create TIME in the DO loop, and then use it as the index or subscript of the array XX. Let's take some TIME to explain exactly what is happening here.

The DATA step is executed for each observation read in (SET) from data set OLD 0. At each iteration, the following happens: Array XX has three elements, the three variables X1-X3 2. The DO loop is operated repetitively over TIME by setting TIME equal to 1, then 2, and finally 3 ©. At each repetition, X is set equal to the value of the element of XX referred to by XX[TIME] 0, and is then OUTPUT © to create a new observation in data set NEW, along with the other new variable TIME, and the variable SUBJECT from data set OLD. Note that X1-X3 are not included in data set NEW because they are dropped ®. As an example, during the first iteration of the DO loop in the first execution of the DATA step, TIME=1 and X is set equal to the value of XX[1], which is the value of X1 (4) in the current observation of data set OLD. These values are then written (using the OUTPUT statement) to data set NEW, and the next iteration of the DO loop is set to go, with TIME=2, etc.

Now if that wasn't bad (or good) enough, you can do better. Arrays don't only come in one flavor- unidimensional.They can also be multidimensional in nature, yielding much more power and many more headaches. Let's plow ahead.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd Protection Status

SAS Programming Topics