Encountering a Subtle Problem with Missing Values, Formats, and PROC FREQ - SAS Programming

In the previous example,you used formats to either display a translated version of a variable's values, or to create a new variable as a grouped version of another variable. Formats can also be used to aggregate data values together for group processing and presentation in procedures that operate on groups of values such as PROC FREQ,a procedure that counts the number (frequency) of occurrences for each formatted value of a variable.

This process of labeling or grouping data values using a format usually has no adverse side effects; however,unexpected results can occur in PROC FREQ, specifically when formatting a variable that contains missing values. Let's look at an example that demonstrates this rather subtle problem.

Example

PROC FORMAT;
VALUE BADFMT 1='ONE'
2='TWO'
OTHER='MISCODED';
RUN;
DATA TEST;
INPUT X Y;
DATALINES;
1 1
2 2
5 5
3 .
;
PROC FREQ DATA-TEST;
TABLES X Y;
FORMAT X Y BADFMT.;
RUN;

Output from Example - Encountering a Subtle Problem with Missing Values, Formats, and PROC FREQ

Output from Example - Encountering a Subtle Problem with Missing Values, Formats, and PROC FREQ

Notice that Output 2 shows the category MISCODED with a frequency count of 2 for the variable X. This represents the two data values that fall into the special format category called OTHER, namely the 5 in Observation 3,and the 3 in Observation 4. Next, look at the output for the variable Y.Notice that there is no longer a row with a label of MISCODED. Why not? The two values in Observations 3 and 4,namely the 5 and the period (missing), are neither 1 nor 2, and therefore should (you would think) be included in the OTHER category.Instead of this, however, the output shows Frequency Missing = 2 at the bottom. Even though there was only one observation that contained a missing value for the variable Y(Observation 4),PROC FREQ reports two missing values.

The reason for this is a little convoluted, but worth the struggle to understand.In this example,since missing values do not have their own separate range in the BADFMT. format, they automatically fall into the OTHER category.Unfortunately,the inclusion of missing values in this OTHER category declares the entire category as missing.The reason is that when a range of values,or a group of discrete values, are grouped together in a single format, all the values in the range or series are represented by the lowest value in the range or series.

Since missing is the lowest value possible,the inclusion of missing in a range, or series, of values translates all the values into missing for formatting purposes.This means that if any observation contains a missing value,all the observations that fall into the OTHER range are treated as missing.PROC FREQ determines the number of missing values after formatting has occurred. Most other procedures that work with grouped data values, such as PROC MEANS and PROC TABULATE,check for missing values before formatting.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics