Using a SAS Data Set to Create a Character Format - SAS Programming

In this example,you use a set of variables in an existing SAS data set to create a format.This is most useful when you have a long list of items and expansions, such as ICD-9 codes and descriptions.For those not familiar with the term, ICD-9 stands for the International Classification of Diseases, ninth revision. Here is a very short portion of the entire list.

Using a SAS Data Set to Create a Character Format

If you have a relatively short list such as this one, you can go ahead and write a VALUE statement list this:

Example - Using PROC FORMAT Directly

PROC FORMAT;
VALUE $ICDFMT '072'»'Mumps'
'410'='Heart Attack'
'487'='Influenza'
'493'='Asthma'
'700'='Corns';
RUN;

When the list of codes is hundreds of entries long as it is with the ICD-9, or even thousands of entries long, this gets to be tedious.There is a better way — the CNTLIN option of PROC FORMAT.By using a SAS data set which contains the codes and descriptions needed, and which meets very rigid structural and naming conventions, you can have PROC FORMAT read this data set (which is called a control data set) and automatically create a format for you. Standard sets of codes and descriptions are usually available in some electronic format which can easily be transformed into a SAS control data set as the first step in creating a very useful data formatting tool.

We demonstrate this technique with two simple examples; the current one creates a character format and the next example creates a numeric format.These two short examples serve merely to introduce this powerful facility.

The following program first converts the previous ICD-9 raw data subset to a SAS data set, then to a SAS control data set, and then to a user-defined SAS format. It then uses the format to associate labels to sample raw data read into another SAS data set and prints out the formatted values. The example is a little cumbersome for educational purposes,and it is explained after the code.

Example - Using a CNTLIN Data Set

DATA CODES; 1
INPUT $1 ICD9 $3. 2
@5 DESCRIPT $12.?
DATALINES;
072 MUMPS
410 HEART ATTACK
487 INFLUENZA
493 ASTHMA
700 CORNS
;
DATA CONTROL; 3
RETAIN FMTNAME '$ICDFMT' 4
TYPE 'C' ;
SET CODES (RENAME*(ICD9«START ©
DESCRIPT-LABEL));
RUN;
PROC FORMAT CNTLIN=CONTROL; 6
RUN;
DATA EXAMPLE; 0
INPUT ICD9 $ @@;
FORMAT ICD9 $ICDFMT.J 8
DATALINES;
072 493 700 410 072 700
;
PROC PRINT NOOBS DATA=EXAMPLE; 9>
TITLE 'Using a Control Data Set";
VAR ICD9;
RUN;

Well now you've seen the code, and if that didn't frighten you away, here is the explanation.The first step is to turn the raw data into the SAS data set CODES to contain the collection of codes and descriptions as SAS variables O.Pretty straight forward stuff.You could have made things easier on yourself by naming the variables in the CODES data set START and LABEL, the names required for the control data set and the CNTLIN feature of PROC FORMAT, which is what this example is all about.

You choose however,to give them other, more realistic names 0 and then rename them 5.This makes the coding more general,and more like the situation you may encounter if you ever need to convert an existing SAS data set with previously assigned variable names into a control data set. The next step is to create the control data set (which you cleverly call CONTROL) ©.The control data set consists of one observation for each pair of codes and descriptions.Each observation must contain a specific set of variables with prescribed namesThese are as follows:

Variable Name Description of the Variable

Variable Name Description of the Variable

You use a RETAIN statement 0 to assign values to the variables FMTNAME and TYPE since they are the same on every observation.You could use assignment statements instead,but the RETAIN statement is more efficient .After the control data set is created, you merely have to tell PROC FORMAT to use it to create a format.This is done with the CNTLIN option 6. You don't even have to name the format you are creating; it's all in the control data set.At this point in the example,you have a character format called $ICDFMT. created and waiting to be used.

You test out your new format by creating another SAS data set called EXAMPLE from a collection of sample ICD-9 data 0. It's important to realize that this represents a set of actually collected data whereas the data used to create data set CODES is a set of all the unique codes that are possible with their descriptions.You permanently assign the formatted values to the codes in the data set EXAMPLE with the FORMAT statement ©.The last step is to print out the data set EXAMPLE ©,with the following result:

Output from Example - Using a SAS Data Set to Create a Character Format

Output from Example  - Using a SAS Data Set to Create a Character Format

One very useful option with PROC FORMAT is the FMTLIB option, which gives you a descriptive listing of your format.In the previous program, you could have added this option as follows:

PROC FORMAT CNTLIN=CONTROL FMTLIB;

This would have generated the following output:

Output from Example  - Using a SAS Data Set to Create a Character Format

Note that the START value is repeated in the END column because you are not using ranges, but rather discrete values.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics