Creating a Summary Data Set That Contains a Median - SAS Programming

In this last example you produce summary statistics and a report which includes a median.You use PROC UNIVARIATE instead of PROC MEANS because you want to compute medians as well as means, and PROC UNIVARIATE is the only SAS procedure that can easily produce medians and then output them to a SAS data set.

When you use this procedure to output statistics to a file,the syntax is almost identical to PROC MEANS.PROC UNIVARIATE and PROC MEANS have slightly different sets of statistics that can be output,and you cannot use a CLASS statement with PROC UNIVARIATE- you are restricted to using a BY statement.The example that follows was thought up around 3:00 a.m.

one night by one of the authors (RC) who had trouble sleeping because of a MEAN head cold.The reader should be able to see the effects of too many clinical symptoms,too much antihistamine and decongestant medication, and too little sleep.The pun in the first line of this paragraph was thought up by the other author (RP) after a full night's sleep and he accepts full responsibility.

You have a SAS data set CLINICAL which contains information on patient visits to a physician.As you can see from the data,there are a variable number of visits for each patient.Variable Name Description

Creating a Summary Data Set That Contains a Median

Here is a listing of the CLINICAL SAS data set:

listing of the CLINICAL SAS data setlisting of the CLINICAL SAS data set

For this example, you want to create a new data set which contains one record for each patient. Each record is to contain:

  1. patient number
  2. date of last visit
  3. last cholesterol measurement
  4. mean values of the following for each patient:cholestero SBP (systolic blood pressure) DBP (diastolic blood pressure)
  5. median cholesterol value for each patient
  6. the proportion of visits that were routine (ROUTINE = Y).

Let's examine the needed items one step at a time.Item 1 is easy and comes as a by-product of some other operations.We'll note it in passing.Items 2 and 3 are obtained by sorting the data set by PATNUM (patient number) and DATE and then selecting the last record for each patient.

Items 4 and 5 are computed by PROC UNIVARIATE and written out to a data set.To compute item 6, you need to create a numeric variable that has values of 0 and 1 corresponding to the character values of 'N' and 'Y'in the ROUTINE variable. By doing this, you can calculate the mean of the numeric variable (let's call it RATIO), which will be the proportion of visits that are routine (i.e., the proportion of observations where ROUTINE=Y).The program and accompanying explanation will get it all done, but not necessarily in the order of the items listed here. Here is the program:

Example

Example

Create a data set with the last record for each patient;
DATA LAST (RENAME*(DATE=LASTDATE
CHOL=LASTCHOL));
SET NEW_CLIN (KEEP=PATNUM DATE CHOL);
BY PATNUM; 3
IF LAST.PATNUM; 0
RUN;
* Output means and medians for each patient to a data set;
PROC UNIVARIATE DATA=NEW_CLIN NOPRINT; ©
BY PATNUM;
VAR CHOL SBP DBP RATIO;
OUTPUT OUT=STATS
MEAN=MEANCHOL MEANSBP MEANDBP RATIO
MEDIAN=MEDCHOL;
RUN;
* Combine the LAST data set with the STATS data set;
DATA FINAL; 6
MERGE STATS LAST;
BY PATNUM;
RUN;
* Print a final report;
PROC PRINT DATA=FINAL LABEL DOUBLE; 0
TITLE 'Listing of data set FINAL in Example 7';
ID PATNUM;
VAR LASTDATE LASTCHOL MEANCHOL MEDCHOL
EANSBP MEANDBP RATIO;
LABEL LASTDATE='Date of Last Visit'
MEANCHOL='Mean Chol'
MEANSBP ='Mean SBP'
MEANDBP ='Mean DBP'
MEDCHOL ='Median Chol'
LASTCHOL='Last Chol'
RATIO ='Proportion of visits that were routine';
FORMAT MEANCHOL MEANSBP MEANDBP MEDCHOL LASTCHOL 5.0
RATIO 3.2;
RUN;

The first DATA step creates a new data set called NEW_CLIN.The sole purpose of this step is to create a numeric variable RATIO so that you can use the mean of this variable to indicate the proportion of visits that were routine.It's a simple concept.In any population, the sum of a binary variable (values are 0 or 1) is equal to the total number of scores equal to 1,and the mean is equal to the proportion of the scores equal to 1. Try it.

The functions TRANSLATE and INPUT are used here to convert the character values of 'N' and 'Y' to 0 and 1 respectively.While the alternative sets of code shown in the comment box are perhaps simpler to understand, the method used here gives you an opportunity to review two of the functions discussed in Chapter 5, "SAS Functions." The TRANSLATE function 0 converts each character value in the from string ('NY') to the corresponding character value in the to string ('01'). Thus, each 'N' becomes a '0'and each 'Y' becomes a '1'.The INPUT function 0 then rereads these character values using the numeric 1. format, and turns each character '0' into a numeric 0, and each character '1' into a numeric 1.A note about the comment box itself is in order here.

It works because the entire box starts with the comment-initiating character string (/*) and ends with the comment-terminating character string (*/).The rest of the box border is just some fancy comment fingerwork. You next sort the data set NEW_CLIN by PATNUM and DATE 2 so that you can use the SET-BY combination in the next DATA step.This DATA step creates the data set LAST by using a SET statement on NEW_CLIN followed by the BY PATNUM statement ©. The use of the BY statement following the SET statement creates the two internal SAS variables FIRST.PATNUM and LAST.PATNUM . Since you only want to keep the most recent (i.e. last) observation for each patient, you use the subsetting IF statement, IF LAST.PATNUM 4.

This gives you just what you want, including the PATNUM, which we said we'd note in passing — here it is.In the next part of the process, you use PROC UNIVARIATE © to create an output data set STATS which contains the patient means for CHOL, SBP, DBP,and RATIO as well as the median CHOL reading. These are contained in MEANCHOL, MEANSBP, MEANDBP, RATIO, and MEDCHOL respectively. Notice that you had to use a BY statement with PROC UNIVARIATE, whereas PROC MEANS gives you the option of using CLASS or BY.

The data set FINAL ® merges the data set STATS, which contains the means and median values, with the data set LAST,which contains the most recent visit date. The merge is done using the BY PATNUM statement to assure that the proper match-merging by PATIENT takes place.Finally, the PROC PRINT statement 0 uses the two options DOUBLE and LABEL.

You've seen these before, but it doesn't hurt to see them again. DOUBLE,as the name implies, doublespaces the output;LABEL tells the procedure to use variable labels instead of variable names as column headings.Finally, here is the output. We hope it was worth the wait and the wade (through that deep code).

Output from Example - Creating a Summary Data Set Containing a Median

Output from Example - Creating a Summary Data Set Containing a Median

This example really shows the enormous power of PROC MEANS and PROC UNIVARIATE in creating summary statistics.It also brings together techniques from (LAST. variables, MERGE, functions, etc.).

Problems

  1. Given the data set GRADES,create a new data set MEAN_GRD which contains the separate mean test scores for boys and girls. Use the same variable name (SCORE) for the mean values in MEAN_GRD as for the raw data in GRADES, and let the procedure produce (print) output.The data set MEAN_GRD should contain only two observations.
  2. Note: Data set GRADES also contains the variable WEIGHT which will be used in other problems.

    Here are some sample data to work with:

    Data set grades

  3. You have a SAS data set EXPER with variables GROUP (A or B), TIME (1 or 2), and SCORE. You want to plot the mean score at each time period for each group with the value of GROUP as the plotting symbol.
  4. Your resulting plot should look like the following:

    resulting plot

    Hint: Use PROC MEANS with a CLASS statement to compute the mean score for each combination of group and time.

  5. Use the PLOT procedure to produce the plot . Use a PLOT statement of the form: PLOT y-axis variable * x-axis variable = plotting symbol variable;
  6. Here are some sample data to work with:

    Data set EXPER


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics