Adding Variables from One Data Set to Another Based on an Identifying Variable - SAS Programming

Now that you understand the basics of merging,let's add a BY statement which allows you to correctly match up observations from two data sets based on a common variable.For this example,consider two SAS data sets: data setDEMOG(demographic data)contains ID,GENDER,and STATE,and is sorted by GENDER; data set EMPLOYEE contains ID,DEPT,and SALARY,and is sorted by ID.

You want a composite report which lists all the variables for each em ployee.To do this,you have to match the records in the DEMOG data set to the
corresponding records in the EMPLOYEE data set (by ID) so you can list values for GENDER and STATE along with values for DEPT and SALARY for each employee.

Here are the two data sets:

two data sets

Notice that,although both data sets contain ID,which is the " matching"variable,the observations in the DEMOG data set are not in order by ID. If you conducted a simple one-to-one merge, you would be combining the wrong observations.You have to make sure both data sets are sorted by ID and then match-merge them.Here is the code:

Example

PROC SORT DATA=DEMOG;
BY ID;
RUN;
PROC SORT DATA^EMPLOYEE;
BY ID;
RUN;
DATA COMBINED;
MERGE DEMOG EMPLOYEE;
BY ID;
RUN;
PROC PRINT DATA=COMBINED;
TITLE 'DEMOG-EMPLOYEE Match-Merged Data';
RUN;

You use the SORT procedure to sort both data sets by ID.Actually,in this example,the second SORT procedure (for data set EMPLOYEE) does not take place because this data set is already in ID order.In SAS Releases 6.07 and higher,a sort flag is stored with each data set,indicating if it is sorted and how.If you include a SORT procedure to sort an already sorted data set again in the same way,the sort does not take place and a note is written to the SAS log informing you of this.You use the BY statement to instruct the system to match-merge records from both data sets on the matching variable(ID).The output of the previous code is as follows:

Output from Example - Adding Variables from One Data Set to Another Based on an Identifying Variable

Output from Example - Adding Variables from One Data Set to Another Based on an Identifying Variable

As you can see,the resulting data set contains all the variables from the two merged data sets, matched by ID and occurring in ID sorted order.But what if there were a different number of observations in one of the data sets?Suppose,for example,that DEMOG contains more records than EMPLOYEE(perhaps DEMOG contains all employees ever associated with the company and EMPLOYEE only contains active employees).Let's add a record (ID=4) to DEMOG (we'll call it DEMOG2) to see what happens when there is no corresponding record in EMPLOYEE.

Here are the contents of data set DEMOG2, along with data set EMPLOYEE:

data set EMPLOYEE

You need only change the code by substituting DEMOG2 for DEMOG and making a few cosmetic changes:

Example

PROC SORT DATA=DEMOG2;
BY ID?
RUN;
PROC SORT DATA=EMPLQYEE;
BY ID;
RUN;
DATA COMBINED;
MERGE DEMOG2 EMPLOYEE;
BY ID;
RUN;
PROC PRINT DATA=COMBINED;
TITLE 'DEMOG2-EMPLOYEE Match-Merged Data1;
RUN;

Here is the new combined data set:

Output from Example - Adding Variables from One Data Set to Another Based on an Identifying Variable

Output from Example - Adding Variables from One Data Set to Another Based on an Identifying Variable

As you might have expected,the combined data set contains the data for ID number 4 from the DEMOG2 data set and missing values for DEPT and SALARY (represented by a blank for the character variable DEPT and by a period for the numeric variable SALARY).


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics