Reading a Mixture of Record Types in One DATA Step - SAS Programming

Consider the following situation: you've been given a set of raw data which are a combination of data lines from different sources.They all contain the same data fields, but they are in different positions in each raw data line depending on the source of the data.There is an identifying value in each observation that denotes the source of the data and,therefore, the formate of the data values for that observation.

This is not at all an unusual situation, and it is one that the SAS System can handle readily. By using a trailing @,the INPUT statement gives you the ability to read a part of your raw data line,test it,and then decide how to read additional data from the same record.

Background: How a DATA Step Builds an Observation

Before you proceed further,you have to know a little about how a DATA step builds an observation in a SAS data set and how an INPUT statement operates with multiple lines of raw data. A DATA step begins when the DATA keyword is encountered,and ends when a DATALINES statement, a RUN statement,another DATA keyword, or a PROC keyword is encountered.

In all the examples so far, each time an INPUT statement executed,a pointer moved to a new record. If,however,you include a single @ at the end of the INPUT statement (before the semicolon),the next INPUT statement in the same DATA step does not bring a new record into the input buffer but continues reading from the same raw data line as the preceding one.

At the end of the DATA step an observation is written to the SAS data set (unless you explicitly use an OUTPUT statement somewhere in the DATA step—see Example 15.2 in this chapter and Example 6.1 in Chapter 3).On the next iteration of the DATA step, the pointer moves to the next record and the INPUT statement begins processing again.

Back to Our Reading Mixed Records Example

Now back to our example with mixed records.A 1in column 20 specifies that your data contain values for ID in columns 1-3, AGE in columns 4-5,and WEIGHT in columns 6-8; a 2 in column 20 specifies that the value for ID is in columns 1-3, AGE is in columns 10-11,and WEIGHT is in columns 15- 17.The following code correctly reads the data:

Example

DATA MIXED;
INPUT @20 TYPE $1. @;
IF TYPE - '1' THEN
INPUT ID 1-3
AGE 4-5
WEIGHT 6-8;
ELSE IF TYPE = '2' THEN ©
INPUT ID 1-3
AGE 10-11
WEIGHT 15-17; DATALINES;
00134168 1
00245155 1
003 23 220 2
00467180 1
005 35 190 2
/ PROC PRINT DATA=MIXED;
TITLE 'Example ';
RUN;

The program works as follows:

  • 0 After reading a value for TYPE in the first INPUT statement, the single trailing @ says,"hold the line," that is,do not go to a new data line if you encounter another INPUT statement.
  • © The IF-THEN/ELSE code tests the current value of TYPE and proceeds accordingly.If the value of TYPE is 1,then the program uses the next INPUT statement to read ID, AGE, and WEIGHT.
  • © If TYPE = 2, then an alternate INPUT statement is used.

When a second INPUT statement (one without a trailing @) isencountered,the data line is released and you are ready for the next iteration of the DATA step.The code produces the following output:

Output from Example - Reading a Mixture of Record Types in One DATA Step

Output from Example - Reading a Mixture of Record Types in One DATA Step

As you can see,all values are assigned to their proper data set variables, regardless of which columns they are read from. Now if you think that a single trailing @ was neat stuff,just wait till the next example.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics