# Formatting Values in a Questionnaire - SAS Programming

In this first example, you have collected questionnaire data and you want to create a finished report showing more meaningful descriptions rather than the numeric codes that were used for data entry Your raw data file contains:

Your raw data file contains Scales which express a range of attitudes using numbers such as 1 to 5 are sometimes referred to as Likert scales,named after a psychometrician who published studies on how to measure attitudes.The five-point Likert scale you are using is: l=strongly disagree, 2=disagree, 3=no opinion, 4=agree, 5=strongly agree. You also want to create a new variable called AGEGROUP which groups age into 20 year intervals up to age 60, and includes all ages above 60 in a single group.

The formats you need are not supplied with the system, so you have to create them.This is done with PROC FORMAT, wherein you create one or more formats, each one created with a VALUE statement. Here is the code to produce the formats you need, as well as the DATA step to create the data set and the PROC PRINT to produce the finished report.

Example

PROC FORMAT;
VALUE GENDER 1 = 'Male'
2 = 'Female'
. = 'Missing'
OTHER = 'Miscoded';
VALUE $RACE 'C' = 'Caucasian' 'A' - 'African American' 'H' - 'Hispanic' 'N' = 'Native American' OTHER = 'Other' ' ' = 'Missing'; VALUE$LIKERT '1' = 'Str dis'
'2' = 'Disagree'
'3' = 'No opinion'
'4' = 'Agree'
'5' = 'Str agree'
OTHER * ' ';
VALUE AGEGROUP LOW-<20 = '< 20'
20-<40 = '20 to <40'
40-<60 = '40 to <60'
60-HIGH = '60+';
DATA QUESTION;
INPUT ID $1-2 GENDER 4 RACE$ 6
AGE 8-9
SATISFY $11 TIME$ 13;
FORMAT GENDER GENDER.
RACE $RACE. SATISFY TIME$LIKERT.;
AGEGROUP=PUT {AGE,AGEGROUP.);
DATALINES;
01 1 C 45 4 2
02 2 A 34 5 4
03 1 C 67 3 4
04 N 18 5 5
05 9 H 47 4 2
06 1 X 55 3 3
07 2 56 2 2
08 20 1 1
RUN;
PROC PRINT DATA=QUESTION NOOBS;
TITLE 'Data listing with formatted values';
RUN;

This example demonstrates many of the basic features of PROC FORMAT and the use of formats in general.The first format youcreate, GENDER, associates the text strings 'Male'and'Female'(placed in single or double quotation marks) with the numeric data values 1 and 2, respectively.

Missing values (.) are represented as 'Missing,' and all other values (OTHER) are identified as 'Miscoded.' This is a numeric format. Note that this format is not connected to any variable in any data set; it just exists waiting to be called into use. This is true of all the formats you create and is a point worth repeating.

The next format you create, $RACE, is a character format. Here you begin the format name with a dollar sign ($) to denote it as a character format (to be used with character variables) and place the data values to be formatted in single or double quotation marks. RACE codes other than 'C', 'A', 'H', or 'N' are associated with 'Other,' and missing values (' ') are associated with 'Missing.' The $LIKERT and AGEGROUP formats are created similarly.In the numeric AGEGROUP format, you assign inclusive ranges of values to specific labels.This is used in the DATA step to create a new grouped variable. At this point in the code, the formats are created. Now you have to use them. The DATA step reads in the raw data and permanently makes the following format assignments: format GENDER.to variable GENDER, format$RACE. to variable RACE, and format $LIKERT.to variables SATISFY and TIME. Note that$LIKERT.was assigned to more than one variable — perfectly legal.

Of much more importance to note is the period (.) after the format names in the FORMAT statement. You do not define the formats with a period in the PROC FORMAT code,but you must include the period whenever you refer to the format in subsequent code.This is a very common beginning mistake, and most of you will make it. We did, and we still do from time to time.Since you link the variables to the formats in the DATA step, you do not have to do so in the PROC PRINT code.

To finish the example,you use the AGEGROUP.format with a PUT function to create a character variable (also conveniently called AGEGROUP) based on the formatted values of the numeric variable AGE (see Chapter 2, "Data Recoding," Example 3, for more information on this technique).The listing from this program follows:

Output from Example - Formatting Values in a Questionnaire

Notice that PROC FORMAT produces no output of its own, with one optional exception which is covered in the last two examples in this.