# Computing a Moving Average - SAS Programming

Performing computations between observations in a DATA step is far more difficult than within observation processing. For example, to compute the mean of X1, X2, and X3, within a single observation, you would write:

MEANX=MEAN (OF X1-X3); or MEANX=(X1+X2+X3) /3;

However,computing the mean of X for the present observation and the two previous observations presents more of a challenge—without the LAG function,that is. Economists often compute a "moving average"to smooth out trends in their data.For example,stock indices such as the Dow-Jones Average can change considerably from month to month.

To see the trend in this index,economists plot the average of the index for the past three months for each month of interest.This smooths out the data so that longer-term trends are more apparent.In this example, you use the LAG function to compute a moving average.LAGn returns the value of the nth previous execution of the LAG function. That is, every time the LAG function executes, it "remembers"the current value of the argument, which will be the lagged value the next time the function executes. An example will make this clear. Here is the code to compute the moving average just described:

Example

DATA MOVING;
SET OLD;
X1=LAG (X);
X2*IAG2 (X)?
AVE«MEAN (OF X XI X2);
IF _N_ GE 3 THEN OUTPUT;
RUN;

The variable X1 is the value of X from the previous observation; X2 is the value of X from the observation before that.The MEAN function is used to compute the average (mean) of the three values.For the first iteration of the DATA step, LAG(X) and LAG2(X) are missing since there was no previous execution of the LAG function. For the second iteration of the DATA step,LAG(X) is assigned the value of X for observation 1,and LAG2(X) is missing.Finally, for the third through the last iteration of the DATA step, LAG(X) and LAG2(X) are assigned values.In this example, you do not output an observation in the new data set unless AVE is actually based on three values.The _N_ variable is useful for testing this condition.You output an observation for the third through last iteration of the DATA step only.

A Special Caution When Using the LAG Function

CAUTION! Do not execute the LAG function conditionally unless you are purposely doing something very tricky and really know what you are doing. To prove our point, look at the following code:

DATA ERROR;
INPUT X @@;
IF X GE 5 THEN Y=LAG(X);
DATALINES;
1 8 3 9 2
RUN;

What are the values of Y? Answer: missing, missing, missing, 8, missing.)The IF statement instructs the system to execute the LAG function only when X is greater than or equal to 5.It is therefore first executed when the second observation (X=8) is read.The next time the LAG function executes (observation number 4, X=9), the value of LAG(X) is 8, the value of X the last time the LAG function executed. Did you get that point? If not, don't fret. This is not easy stuff.