# Checking for a New Subject Number Using a LAG Function and a Trailing @ - SAS Programming

A truly compulsive SAS System programmer like at least one of the authors (well, maybe at least two of them)cannot leave this topic without presenting just one more solution.In this last example,the SAS System reads in a value for DOB only when it reaches what it knows to be a new subject.

How does it know it's a new subject? It simply reads in a value for SUBJ and then uses the trailing to hold the line while it tests to see if it has a new subject or not.It uses the LAG function to make the test 0 and then treats the remainder of the input line accordingly.If it is a new subject, it reads in a new DOB ©; if not,DOB equals its retained value 0. Here is the code:

Example

DATA COMPULSEj
RETAIN DOB;
INFILE 'HTWT';
INPUT §1 SUBJ $2, 8? 0 IF SUBJ KE LAG{SUBJ) THEN © INPUT 67 DOB MMDDYY8. © S17 WEIGHT 3,; ELSE 0 INPUT 817 WEIGHT$.',
FORMAT DOB MMDDYY8,;
PROC PRINT DATA=COMPULSE;
TITLE 'Another Correct DOB Solution';
RUN;

The results are identical to the listing shown in the output from (except for the order of the variables).The examples in this chapter that are based on the LAG function require that all the data for a single subject be grouped together in the input stream.If this is not the case, the data set can be read as is and then sorted with PROC SORT.

Of course,the data would not then be input into the new data set being constructed,but rather would be set in from an existing SAS data set. Also, if you actually have to make another pass through the data after they are sorted, you can use the FIRST,and LAST, variables as an alternative to the LAG function.

One last comment about this example.You use the DROP statement here to drop OLD_DOB from the data set you are building.Either the KEEP or the DROP statement can be used to achieve the same results.Older versions of the SAS System will not allow both in the same DATA step.

Problems

1. You have a SAS data set DIET which contains variables ID, DATE, and WEIGHT.There are four records per ID, and the records are sorted by DATE within ID.
2. The task is to create a new SAS data set DIET2 from DIET which contains only one record per subject, with each record containing the subject ID and the mean weight for the subject.This problem could be easily solved by using PROC MEANS or other methods,but for the purposes of this exercise, use a DATA step with a RETAIN statement in your solution. As an additional "learning experience," rewrite the code using a sum statement (not a SUM function.)

Hint: Include a BY ID statement after a SET statement in the DATA step, and then use FIRST,and LAST,variables. The data for the first two subjects of data set DIET are shown below

3. You have a collection of raw data (in external file TESTSCOR) representing reading scores on three groups of subjects: control, method A, and method B(group codes are C, A, and B, respectively).The data are arranged so that a group code is followed by one or more scores for that group,and scores for any group can span more than one record of raw data (unfortunately, this is not at all an uncommon pattern in which data can and do occur in the real world.) Your task (should you chose to accept it) is to write a program which will read these data and create a S AS data set READING with variables GROUP and SCORE, one set per observation.
4. Hint: Here is one way to get started - there are others. Read every data item in the raw data file as a character value and test if it is an 'A', 'B' or 'C'. If it is one of those values, set GROUP equal to that value and then read the numeric data. If not,convert the character (numeral) you just read to a number using the INPUT function (syntax:SCORE = INPUT (CHARVAR, 5.);) Don't worry, we still left some work for you to do.

Here are some sample data from external file TESTSCOR: