Using a User-Created Informat to Filter Input Data (Setting Invalid Values to Missing) - SAS Programming

Now you take a different approach and use a simple and elegant method that yields, but performs the process in a more efficient manner by changing the data on the way into the data set.Here you filter the data values as they are read in and do the necessary conversion at INPUT time by first creating a user-defined informat. As before,you proceed using two alternate methods. First, you will handle all values other than 'M' or 'F' (including missing) as a single group.The next example separates missing from invalid data. Here is the code for the first method:

Example

PROC FORMAT;
INVALUE $GENDER 'M', 'F'= __SAME_ OTHER='""'; RUN; DATA SCREEN3; INPUT €1 ID 3. €4 GENDER$GENDER1. ;

Here's how it works.You first define an informat in the PROC FORMAT code by using an INVALUE statement instead of a VALUE statement.Since you are defining an informat that will result in a character value, you specify an informat name that starts with a dollar sign ($).(If you wanted the result to be a numeric value, you would leave off the beginning$.)Informat names can only be seven characters in length, including the $if it is used. This is in contrast to format names, which can be up to eight characters in length (including the$ if it is used.) The reason is that the system needs room to add an internal tag to the front of the name declaring it to be an informat - it actually adds a @ which you can see in system generated messages.

Following the informat name, you indicate specific values and/or ranges of values on the left and their corresponding resultant informatted values on the right (of the =).If a value is read in that matches a value in the list or lies within a specified range, the informatted value gets assigned to the variable.

In this example, if you wanted to store expanded versions of the 'M' and 'F', you might have coded 'M' = 'MALE', 'F' = 'FEMALE'. Since you are satisfied with the one- character data values, you choose not to change them. You accomplish this by using the keyword _SAME_ which instructs the system to leave these values intact. Values other than an 'M' or an 'F' are, however, set equal to missing because of the OTHER = ' ' assignment.

There are also other special key words that can be used in the range specifications on the left of the = such as HIGH and LOW.These can be used when creating formats as well as informats. Although you create your informat as $GENDER, you write it as$GENDER1. in the INPUT statement. Just as you do with formats, you must include a period(.) at the end of the informat when you use it in a subsequent DATA step but not when you create it.When creating informats, or formats, you cannot end the name with a number.

You can, however, append a number to user-defined, as well as system-supplied, formats and informats when you use them in subsequent DATA or PROC steps. With formats,the number determines the display width of the formatted variable. With informats, the number specifies how many characters to read from the input record. The default length for an informat is the longest informatted value (in this case it is equal to 1).

It is good coding practice to follow all user-defined informats with a number to ensure that you read the correct number of columns from the input record. You should, however,realize that like a LENGTH statement,this can also establish the length of a variable.