Suppose you have employee records in a SAS data set EMPLOY that contains the variables ID, GENDER(l=male, 2=female), DEPT(department code), SALARY (yearly salary),YEARS(number of years employed with the firm), and STATE 2 digit state of residence code).You want to create a new SAS data set, NJEMPLOY, which is a subset of EMPLOY made up of only those employees who live in New Jersey.You want NJEMPLOY to contain all the variables from the original EMPLOY file.There are several ways to accomplish this. Here's one:
ExampleLIBNAME MARY 'C:EMPLOYEEJQBDATA';
Reading Observations from an Existing SAS Data Set
Before we discuss the process of selecting a subset of data,we have to talk about how you access the existing SAS data set that you are going to subset. This process is handled by the SET statement.Think of the SET statement as being similar to an INPUT statement,with the difference being that a SET statement reads records from a SAS data set while an INPUT statement reads raw data from an external file (or instream data).
In this case,the data being used to create the SAS data set NJEMPLOY are being read in from the existing SAS data set EMPLOY.But what about that LIBNAME statement? In,"INPUT and INFILE,"you used a FILENAME statement to create a fileref (file reference).This enabled you to access raw data stored external to the code you were writing.
The LIBNAME statement is similar in that it creates a libref (library reference) to refer to an external SAS data library where you store SAS data sets (along with other SAS entities).Here,the LIBNAME statement creates the libref called MARY that refers to the SAS data library called
C:EMPLOYEEJOBDATA which contains a SAS data set called EMPLOY,possibly among other things. If it all seems a little confusing,it's because it all is a little confusing. Don't worry though; it all does make sense and will become second nature to you.
Subsetting IF Statement
When an IF statement is used in this context, it is called a subsetting IF statement. It is shorthand for,IF NOT condition THEN DELETE;(i.e., if this condition is true,continue processing the remaining statements in the DATA step; if not, delete this observation and return to the top of the DATA step).
In this case,it stands for IF NOT (STATE EQ 'NJ') THEN DELETE;.The resulting SAS data set NJEMPLOY contains only those observations from EMPLOY where STATE is equal to NJ.Subsetting IF statements often do the job quite adequately, but there is an alternative method called a WHERE statement.(There is also a WHERE data set option available, but this time you use the statement version.) Here is the code to accomplish the subsetting you want by using a WHERE statement:
ExampleLIBNAME MARY 'C:EMPLOYEEJOBDATA';
fferences between IF and WHERE Statements
This program is identical to the previous example except that the sub setting IF statement is replaced with a WHERE statement.Are there important differences? Both of these programs produce identical results,but there are differences between IF and WHERE statements.
The WHERE statement may be more efficient then the subsetting IF (especially if you are taking a very small subset from a large file) because it checks on the validity of the condition before the observation is brought into a temporary holding area, whereas the subsetting IF statement brings in the entire observation and then checks the condition to see if the observation is to be kept or not.
This temporary holding area is called the program data vector (PDV). A WHERE statement can only be used with variables in the existing data set, whereas a subsetting IF statement can be used with raw data as well.
Another difference between a subsetting IF statement and a WHERE statement may surface when you use the FIRST.and LAST.logical variables when the WHERE condition is not true,the observation is not brought into the PDV,and therefore it does not affect the logical values of the FIRST.and LAST.variables.Another major difference between IF and WHERE statements is that you may include WHERE statements in SAS procedures! For example,if you have a data set called ALL (containing the variables ID, SEX, and SALARY),and you want a listing only for MALES (M), you couldcode this as:
This saves you the work of creating a new data set just to obtain your listing
WHERE Statement Operators
Before we consider variations on this example,let's discuss some of the special WHERE statement operators.Suppose you want to select only people between the ages of 20 and 40 (inclusive).There are several ways to code this using the WHERE statement.You could write:WHERE AGE GE 20 AND AGE LE 40;
The BETWEEN values can be constants (numeric or character) or expressions.
Some useful WHERE operators and their actions are as follows:
KEEP and DROP Statements
Now let's get back to the example at hand. Let's modify this program in several ways. First,suppose that you only require the variables ID and DEPT in the new data set.You can easily accomplish this with a KEEP statement or a KEEP= data set option.First,let's see how KEEP and DROP statements work.
When you create a new data set from an old one, you may want to keep only selected variables.You have several ways to do this.A KEEP statement placed anywhere in the DATA step causes only those variables listed to be written to the newly created SAS data set. For example, if you want to keep only ID and DEPT in the new data set (and also want to do the NJ subsetting),you can do it like this:
ExampleLIBNAME MARY 'CtXEMPIiOYEEXJOBDATA';
Data set NJEMPLOY contains only the variables ID and DEPT and only those observations for which the STATE code is equal to NJ.If you want to keep most of the variables from the old data set in the one you are creating,you may choose to use a DROP statement to exclude the variables you don't want.
There is absolutely no difference between keeping the variables you want or dropping the variables you don't want.The choice usually depends on which list is shorter to write.You can replace the previous KEEP statement with the equivalent DROP statement:DROP GENDER SALARY YEARS STATE;
Since SAS programmers (actually, all programmers that we know) don't like to type any more than necessary,the KEEP statement is the preferred one to use here.When in doubt about which to use, give the edge to the KEEP statement since it is clearer to see which variables will be in the new data set.
KEEP= and DROP= Data Set Options
An alternative to a KEEP or DROP statement is a KEEP= or DROP= data set option.The variables to be kept or dropped are listed following the data set name,in parentheses.Following is a program that produces identical results to the previous one except that it uses a KEEP= data set option instead of a KEEP statement.In this case,the data set option is more efficient than the statement version.First we show the code and then the explanation.
ExampleLIBNAME MARY ' C:EMPLOYEEJOBDATA' ;
The KEEP= data set option instructs the DATA step to read only the variables ID,DEPT, and STATE into the PDV (remember this stands for the program data vector) for processing.Only those variables listed in the KEEP= option list are available in the DATA step, and only those same variables are written out to the SAS data set you are creating.
Using the KEEP= data set option instead of the KEEP statement allows the software to deal with fewer variables in the PDV and is therefore more efficient. Now you understand the difference between the KEEP= data set option and the KEEP statement.
But why include STATE in the KEEP= data set option if you only want to keep DEPT and ID in the new data set you are building? The answer lies in the WHERE statement. In order to test the value of STATE, the SAS System has to have it included in the PDV.
If you omit STATE from the KEEP= list,the WHERE statement will not work, and the log will show you the error of your ways.Since you do not want the variable STATE in your new data set,use a DROP statement to indicate that it is not to be written to the data set NJEMPLOY.For the last subsetting example, suppose you want to choose records from the states of NY, NJ, FL, and CA. You can do this the long way as follows:WHERE STATE EQ'NJ' OR STATE EQ'NY' OR
or you can do it the short way like this:WHERE STATE IN ('NJ','NY','FL','CA');
The two expressions produce identical results.The IN operator also works with numeric variables.To select observations for ID values 3, 7, and 9, you would write:WHERE ID IN (3, 7, 9);
You may use either commas or spaces to separate the numeric values.
SAS Programming Related Tutorials
|SASS (Syntactically Awesome Style sheets) Tutorial||R Programming language Tutorial|
SAS Programming Related Interview Questions
|Logistics Interview Questions||SAS Programming,SQL server Interview Questions|
|Clinical SAS Interview Questions||SASS (Syntactically Awesome Style sheets) Interview Questions|
|SQL Server Analysis Services (SSAS) Interview Questions||R Programming language Interview Questions|
|SAS DI Interview Questions||Advanced SAS Interview Questions|
|Base Sas Interview Questions||SAS Macro Interview Questions|
|Clinical Data Management Interview Questions|
Sas Programming Tutorial
Input And Infile
Set, Merge, And Update
Table Lookup Tools
Proc Means And Proc Uimivariate
All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.