Restructuring a SAS Data Set: Creating a Single Observation from Multiple Observations - SAS Programming

Now you do in reverse!You again restructure a data set, but this time you are going from a data set with multiple observations per subject to a data set with a single observation per subject. Your original data set OLD contains one observation for each subject-time combination, with a measurement value, X, as follows:

Restructuring a SAS Data Set

You want a data set NEW that contains one observation per subject, with all measurements on each record, as follows:

Data set New

XI in data set NEW is the value of X at time 1 in data set OLD, etc. Here's how to do it:

Example

PROC SORT DATA=OLD; H
BY SUBJECT TIME;
RUN;
DATA NEW;
SET OLD; 0
BY SUBJECT;
ARRAY XX|*] X1-X3;
RETAIN X1-X3; ©
IF FIRST.SUBJECT=1 THEN DO 1=1 TO 3; 0
XX[I]=.!
END;
XX[TIME]=X; 0
IF LAST.SUBJECT^ 1 THEN OUTPUT; ®
KEEP SUBJECT X1-X3; 0
RUN;

This program uses a number of powerful SAS System programming features.You start by sorting the data by SUBJECT, and TIME within SUBJECT 0.If you are really sure the data set is already in SUBJECT- TIME order, the sort is unnecessary. Next, you create a new data set NEW by reading in observations from data set OLD 0. In general, a BY statement following a SET statement sets up special temporary variables called FIRST.

fryvar and LAST.fryvar for each of the variables in the BY list.They are temporary because they are not saved with the data set, but rather they exist for the duration of the DATA step only; FIRST.fryvar has a value of 1(true) for every observation that is the first occurrence of a new BY group, and a value of 0 (false) for all other occurrences. Likewise, LAST.byvar has a value of 1 for the last occurrence of a BY group,and a value of 0 for all other occurrences.

In our example, the temporary variables work as follows. After being sorted, data set OLD exists in the sorted order of TIME within SUBJECT. Using the BY statement following the SET statement in the DATA step, you instruct the SAS System to create the internal variables FIRST.SUBJECT and LAST.SUBJECT. Each time you begin a new SUBJECT BY group (begin reading observations for a new subject), the variable FIRST.SUBJECT 0 is true (set equal to 1).

At all other times, FIRST.SUBJECT is false (equal to 0).When you encounter the last observation for a subject, LAST.SUBJECT 0 is true (1). At all other times, it is false (0).You use these logical (true or false) FIRST.fryvar and LAST.byvar variables to accomplish different goals.

The program works by accumulating values for all times for each subject from OLD before outputting an observation to NEW. It's a little tricky, so pay attention. A RETAIN statement © is used to "remember" the values of X1-X3 until it's time to output them.When the first observation from OLD is read, SUBJECT=1, TIME=1, X=5, FIRST. SUBJECT=1, and LAST.SUBJECT=0. The elements of ARRAY XX, (XI, X2, X3)are initialized to missing (.) 0. Although this initialization step is not necessary here, if you are missing an observation for one or more times for a subject, you would retain the values from the previous subject if you did not initialize XI to X3 to missing.This initialization is done each time a new subject is read in (FIRST.SUBJECT=1). Statement © instructs the program to set the element of XX with a subscript equal to the current value of TIME, equal to the current value of X.

For the first observation,TIME=1 and X=5.Since TIME = 1, XX[TIME] is the same as XX[1] which represents the variable XI (set equal to 5, the value of X).

The next observation from OLD is then read (SUBJECT=1, TIME=2, X=6, FIRST.SUBJECT=0, and LAST.SUBJECT=0.) Values of X1-X3 are retained from the previous observation (Xl=5, X2=., X3=.) and XX[TIME], or XX[2], or X2, is set equal to 6, the current value of X in OLD. This happens again for the next observation resulting in Xl=5, X2=6, and X3=7.This time however, since LAST.SUBJECT ® is true (equal to 1),the observation is ouput to NEW. Only the variables SUBJECT, XI, X2, and X3 are kept 0. Th process is then repeated for SUBJECT 2.

Problems

  1. You have SAS data set OLD which contains the variables ID,HTIN1-HTIN10 (10 height measurements in inches), and WTLB1-WTLB10 (10 weight measurements in pounds.) Write a program using arrays to create a new data set NEW with variables ID (same ID),HTCM1-HTCM10 (same 10 heights - now in centimeters), and WTKG1-WTKG10 (same 10 weights - now in kilograms.) The conversions are: (Height in CM) = 2.54 * (Height in inches)
  2. And (Weight in KG) = (Weight in pounds) / 2.2.

  3. You have a SAS data set SURVEY which uses missing value codes of-1 for variables X1-X100,99 for variables Y1-Y50, and the literal string 'NO DATA' for the character variables A, B, C, D, and E. Use arrays to replace these missing values with standard SAS missing values.
  4. Given the SAS data set ONEPER as shown, use an array to restructure this data set into the
  5. SAS data set FOURPER:

    Data set ONEPER

  6. Modify your program in Problem 7-3 so that the values for TREAT are translated from numeric values 1, 2, 3, and 4 into character strings 'A', 'B', 'C', and 'D', respectively.
  7. Hint: Use another variable in the DO loop and compute TREAT inside the loop.

  8. Given the SAS data set THREEPER as shown, use arrays to restructure this data set into the SAS data set ONEPER.
  9. Notes: 1. There are no missing values in the data set.
    2. The data set is already sorted in ID order.

    Data set THREEPER


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics