Reading Two Lines (Records) per Observation - SAS Programming

You now know how to specify column ranges of raw data when reading values into SAS variables,and you can even jump around within a line of data.But what if a set of data for an observation spans multiple lines (records)on the raw data input file? You can easily tell the SAS

System which line contains the next data value to read by using a line pointer(# or /).We will only cover the basic situation here where each observation contains the same number of lines; however,real life situations can get much more complicated where there can be different numbers of records for different observations.We leave this and other advanced tasks as lookup assignments for you,to be completed when the need arises

Suppose we extend the last example by adding a second line of data per observation.The new data description is as follows:

Notice that both lines of raw data contain the subject ID number.This is a good policy in general and will aid in data integrity and validity checking . Although you could read the ID number from both records for each subject (with a different variable name for each),and then check one against the other for validity before proceeding to the next observation, you do not do so here.The following code reads the data,two records per observation.

Example

DATA POINTER;
INPUT #1 @1 ID 3.
@5 GENDER $1. @7 AGE 2. @10 HEIGHT 2. @13 DOB MMDDYY6. #2 @5 SBP 3. 69 DBP 3. §13 HR 3,j FORMAT DOB MMDDYY8.; DATALINES; 101 M 26 68 012366 101 120 80 68 102 M 32 78 031460 102 162 92 74 103 F 45 62 112647 103 134 86 74 104 F 22 66 080170 104 116 72 67 ; PROC PRINT DATA=POINTER; TITLE 'Example '; RUN; The #'s in the INPUT statement tell the SAS System which raw data lines to access when reading in values.In this case,the instructions are to obtain values for ID,GENDER,AGE,HEIGHT and DOB from line 1 (#1) for each observation,and to obtain values for SBP,DBP and HR from line 2 (#2) of each observation. Although values for the ID number are present on both records for each observation, you only read them from line 1 in this example.Output for this code is as follows: Output from Example - Reading Two Lines (Records) per Observation If the raw data consist of the same number of records per observation,and if all records are to be read for each observation,as is the case in the current example,then instead of explicitly denoting which numbered record to go to for each subset of variables,you can just tell the system to go to the next line after reading the final value from the current line.This is accomplished by using the relative line pointer (/) indicator.The following INPUT statement could be used instead of the previous one, and the output would be identical: INPUT §1 ID 3. §5 GENDER$1.
§7 AGE 2.
@10 HEIGH T 2.
@13 DOB MMDDYY6.
§5 SBP 3.
§9 DBP 3.
§13 HR 3.;

In this case,the SAS System begins to read data from the first raw data input record.After reading values for ID through DOB, it moves to the next raw data record for three more variables (SBP,DBP,HR).These variables are all part of the same observation being built in the SAS data set. When the system finishes building the current observation,it advances to the next raw data record and starts to build the next one.Using the absolute line pointer specifications in Example is preferable to using the relative control shown above.

Absolute line pointer control allows you to go forward or backward,makes it less likely to miscount the number of slashes,and makes for code that is easier to read. We show you the relative line pointer method since you may encounter programs that use it.

Skipping Selected Input Records

And now, one last wrinkle before we abandon the topic of multiple raw data input records per observation. Suppose there are many (or even two)lines of raw data,and you only wish to read from a few lines (or even one) per observation.You might wonder why you should type in extra lines to begin with.

One answer is that you may be wrapping SAS code around an existing file of raw data that contains more than you need for the current application, but there are compelling reasons not to reshape the data (extra effort involved, chance of mutilating perfectly good valid data, etc.). How can you not read certain lines? There are two methods.

Either use multiple slashes (///) to skip unwanted lines, or explicitly direct the INPUT statement to only the desired lines by using numbered #'s.In either case, if there are unwanted records at the end of each set of raw data lines, they must be accounted for.Suppose you have a file of raw data consisting of four records per observation,and youonly want to read two variables from the second record of the four.The following code accomplishes this.

Example

DATA SKIPSOME;
INPUT #2 @1 ID 3.
612 SEX \$6.
#4;
DATALINES;
101 256 RED 9870980
101 898245 FEMALE 7987644
101 BIG 9887987
101 CAT 397 BOAT 68
102 809 BLUE 7918787
102 732866 MALE 6856976
102 SMALL 3884987
102 DOG 111 CAR 14
;
PROC PRINT DATA=SKIPSOME;
TITLE 'Example ';
RUN;

The previous INPUT statement instructs the system to go directly to the second available line of input data (#2),read data for two variables from that line (ID, SEX),and then go directly to the fourth line of input (#4).It is essential to include the #4 pointer even though you are not reading any data from line 4.

It is needed so that the correct number of lines are skipped,and the program reads the correct line of data for the beginning of each observa tion.On each iteration of the DATA step, line 2 is read and the pointer then moves to the fourth line.The previous code yields the following output:

Output from Example - Skipping Selected Input Records

As expected, the only data that are read and converted into data set variables, andsubsequently printed out, are those for ID and SEX