The present example is slightly more advanced than others in this book,and it covers an somewhat esoteric topic.You may want to skip it for now(orforever),and that's just fine.We use the term "fuzzy" in this example to refer to the merging of similar but not exact matches between two files.
The most common use for "fuzzy" merges is when you use a name as the matching variable.In this example, we merge data from two files, using similar sounding names,and date of birth as the matching variables.SAS System Releases 6.07 and higher support a SOUNDEX function which follows the algorithm described in D.E.Knuth's book, "The Art of Computer Programm ing, Volume 3.Sorting and Searching," Reading, MA: Addison-Wesley.
This algorithm discards most vowels and substitutes numbers for groups of like-sounding consonants.The result is that like-sounding words or names will translate to the same SOUNDEX description.In the following program, you use the SOUNDEX function to translate the names in both data sets to their SOUNDEX equivalents and then merge the two data sets by the SOUNDEX name and the date of birth.
The use of another variable such as date of birth is necessary since there may be too many like-sounding names in the two data sets to be merged, but it would be unlikely to have like-sounding names with the same date of birth in both data sets. Depending on the size of the two files, you may need to use additional variables to add in the merge.
Data sets ONE and TWO are used to illustrate a "fuzzy" merge.
The program is shown next:
You start out by creating two new data sets,ONEJTEMP and TWO_TEMP,from the two data sets you want to merge, ONE and TWO.Each of the new data sets contains all the variables from the original plus a new variable, S_NAME, which is the SOUNDEX equivalent of NAME.
You rename the variable NAME in each of the data sets (to NAME_ONE and NAMEJTWO)so that you can maintain the original names from the original data sets and,thereby, see which names were actually matched.If you did not rename the variable NAME,then only the value of NAME from data set TWO would remain in the merged data set.The new data sets are used for your merging process but are not saved.
Here are the two data sets ONE_TEMP and TWO_TEMP followed by the resulting merged
There are three matches between these two data sets where both the SOUNDEX equivalent and the date of birth are the same.Keep in mind that this program is only an example of how to match observations from multiple files on inexact criteria and is only intended to serve as a model of how " fuzzy& quot; matching is performed.You can expect a certain percentage of incorrect matches with procedures such as this,and careful testing must be performed to determine how to perform such a matching task.
SAS Programming Related Tutorials
|SASS (Syntactically Awesome Style sheets) Tutorial||R Programming language Tutorial|
SAS Programming Related Interview Questions
|Logistics Interview Questions||SAS Programming,SQL server Interview Questions|
|Clinical SAS Interview Questions||SASS (Syntactically Awesome Style sheets) Interview Questions|
|SQL Server Analysis Services (SSAS) Interview Questions||R Programming language Interview Questions|
|SAS DI Interview Questions||Advanced SAS Interview Questions|
|Base Sas Interview Questions||SAS Macro Interview Questions|
|Clinical Data Management Interview Questions|
Sas Programming Tutorial
Input And Infile
Set, Merge, And Update
Table Lookup Tools
Proc Means And Proc Uimivariate
All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.