Holding the Data Line through Multiple Iterations of the DATA Step - SAS Programming

If a single trailing @ tells the system to "hold the line",what do you suppose a double trailing @ would instruct it to do? Why,"hold the line more strongly", of course! What does this translate into? Remember that under normal conditions,a complete iteration of the DATA step constructs one observation in a SAS data set from one raw data line.

The DATA step then repeats this process, again and again,until there are no raw data lines left to read.Each time an INPUT statement ending with a semi colon (and no trailing @)is executed, the pointer moves to the next record. By using a double trailing @ (@@), you can instruct the SAS System to use multiple iterations of the DATA step to build multiple observations from each record of raw data.

An INPUT statement ending with @@ instructs the program to release the current raw data line only when there are no data values left to be read from that line.The @@,therefore,holds an input record even across multiple iterations of the DATA step.This is different from the single trailing @, which holds the line for the next INPUT statement but releases it when an INPUT statement is executed in the next iteration of the DATA step.

The next two programs both accomplish the same result; they build identical SAS data sets. Notice that you are using list input in these examples. You can use @@ in other kinds of INPUT code,but it makes the most sense with list input. As a matter of fact, we're hard pressed to think of a non-esoteric situation in which you would want to use @@ with column input.The first following program does not use @@ so that you can see the comparison:

Example

DATA LONGWAY;
INPUT X Y;
DATALINES;
1 2
3 4
5 6
6 9
10 12
13 14
;
PROC PRINT DATA=LONGWAY;
TITLE ' Exmaple';
RUN;

Now here is the short way, using @@, with considerably fewer data lines:

Example

DATA SHORTWAY;
INPUT X Y @@;
DATALINES;
1 2 3 4 5 6
6 9 10 12 13 14
;
PROC PRINT DATA=SHORTWAY;
TITLE 'Example';
RUN;

Here's how it works. Data values are read in pairs.Three pairs are read from the first data line,and then the INPUT statement goes to the next data line for more data.The important thing to realize is that, although there are only two raw data lines,the DATA step actually iterates six times,one for each set of variables (X and Y) named on the INPUT statement.The @@ stops the system from going to a new raw data line each time the INPUT statement executes.In effect,all the data values can be thought of as strung out in one continuous line of data.

Using @@ causes the DATA step to keep reading from a data line until there are no more data values to read from that record (reaches an end-of-record marker), or until a subsequent INPUT statement (that does not have a single trailing @) executes. Here is the output.

Output from Example - Holding the Data Line through Multiple Executions of theDATA Step

Output from Example - Holding the Data Line through Multiple Executions of the DATA Step

Extra Caution with Missing Values and @@

Remember what happened way back at the beginning of this monumental chapter when you were missing input data for a single variable and didn't use a period to represent the missing value? Two raw input data lines were incorrectly merged into one very wrong data set observation. Only a small amount of data was affected because each new execution of the DATA step started with a new data line.

The system "caught up"with itself and then got back on track.If the same missing data situation were present when using @@,all succeeding values in all succeeding observations would be in error. (This is not to say that one incorrect data value is better than many!)

To illustrate what we mean about compounding errors, suppose you inadvertently omitted the second X value 3 in the previous code and entered the first line of data as 1 2 4 5 6.From that point on, the program would be "out of whack" and would be reading Y values for X's, and vice versa, until it reached the end of the raw data where it would look in vain for that last Y value.

The output would be as follows:

Output from Example - When a Data Value is Left Out

Output from Example - When a Data Value is Left Out

Read the SAS Log!

If you just skim the output,everything might look right. Don't stop there!All SAS System jobs are accompanied by a SAS log that documents the processing of the SAS statements and the manipulation of SAS data sets,and presents notification that various procedures were executed.
The SAS log accompanying the previous SAS program looks (in part) something like this:

1 DATA SHORTWAY;
2 INPUT X Y @@;
3 DATALINES;
NOTE: LOST CARD.
RULE:- + 1- +- 2- +- 3- + 4 + 5 +
6 ;
X=14 Y=. _ERROR_=1 _N_=6

NOTE:1. SAS went to a new line when INPUT statement reachedpast the end of a line.
2. The data set WORK.SHORTWAY has 5 observations and 2variables.
3.The DATA statement used 0.06 CPU seconds and 2389K.

The LOST CARD note in the SAS log is the system's way of telling you that a problem has occurred.In this case, it should lead to an examination of the raw data input values where the problem could be easily found and corrected.Of course this is a simple illustrative example, and real life situations are not this easy to deal with.

The moral is abundantly clear.When you see messages like this in the SAS log, do not ignore them.The system is trying to tell you something. By the way, if you had been unlucky enough to omit two values somewhere in the data stream above,the system would never figure it out.

Although the two omissions would seem to cancel each other out, all intervening data values in the SAS data set would be wrong.As is true with most SAS System tools, double trailing @'s are very powerful.But you must use them with care.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics