Controlling Which Observations are Added to the Merged Data Set - SAS Programming

Situations like the previous one in Example are commonplace wherein you have unequalsets of observations in multiple data sets to be merged into one.If your goal is for the merged data set to contain only those observations which exist in one of the data sets being merged,all is not lost.SAS software can easily handle that by making use of special IN= variables.Let's get to the example.Using the two files DEMOG2 and EMPLOYEE,let's see how you can control which observations get included in the merged data set.Here are the two data sets after they have been sorted:

Controlling Which Observations are Added to the Merged Data Set

Employee 4,which exists in the DEMOG2 data set,does not exist in the EMPLOYEE data set.When you perform a match-merge using a BY variable,the SAS System can check if an observation is being contributed from each data set in the MERGE list.For example,for ID = 1,you have an observation from both data sets; for ID = 4 you only have a contribution from the DEMOG2 data set.You can test for this as follows: (Assume you have already sorted both data sets by ID.)

Example

DATA BOTH;
MERGE DEMOG2 EMPLOYEE {IN=EMP);
BY ID;
IF EMP»lj
RUN;

The variable name following the IN= data set option, BMP in this case, refers to a logical variable which will be created with a value of true (1) or false (0).As each observation is built,if EMPLOYEE has data to contribute, EMP will equal 1; if not, EMP will equal 0.These IN= variables are temporary in that they exist only during the execution of the DATA step.

They are not saved with the data set(s) upon completion.In this example, EMP is true for every value of ID except for ID = 4.You use a subsetting IF (remember that?) to include only obser vations where EMP is true.The resulting data set BOTH contains observations for ID's 1, 2, 3,and 5only.If you wanted to be sure that you were selecting employees which were in both the DEMOG2 file and the EMPLOYEE file, you would include an IN= option on both data sets like this:

Example

DATA BOTHj
MERGE DEMOG2 (IH=DEM) EMPLOYEE (I8=EMP}y
BY IDj
IF DEM*1 AND EMP*lj
HUN;

This would ensure that the values of ID added to the new data set BOTH existed in the DEMOG2 data set and the EMPLOYEE data set.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics