Creating Unweighted Summary Statistics (Step 1) - SAS Programming

This example demonstrates how to compute unweighted means for a population containing varying numbers of observations per subject.In other words, you want each subject to contribute equally to the overall mean.

You have to process the data twice:the first processing produces a mean for each subject, and the second processing uses the per-subject value to produce a mean over all subjects.Suppose you have a data set which contains blood pressure readings for a number of subjects,but there are a variable number of observations per subject per year.You want a mean value for your readings per year, but you only want one reading per subject per year.Variables are: SUBJ,YEAR, SBP(systolic blood pressure), and DBF (diastolic blood pressure).

Here are a few sample observations:

Creating Unweighted Summary Statistics (Step 1)

You first have to compute the mean SBP and DBF for each subject for each year and putthese values out to a new output data set.You then use this new data set to compute yearly means over all subjects.Note that you cannot simply use PROC MEANS with YEAR as a CLASS variable.

Doing this would include all readings over each year, including multiple readings per subject (with different numbers of readings per subject) and would,therefore, produce a weighted mean.Subjects with more measurements per year would make a larger contribution to the overall mean. This is generally not what is wanted.Here is a set of programs which produce unweighted yearly means.The first program produces a data set, MEANOUT,containing the mean SBP and DBP for each subject for each year.


* Note: Data set PRESSURE does NOT have to be sorted;

Notice first the NOPRINT option on the PROC MEANS statement.This instructs the system not to print the resulting statistics.You really don't need to see output at this point because you are basically using the procedure to create another data set that will then be processed to produce the values that you really want, the unweighted yearly values.

In this example, you do not specify variable names after MEAN= in the OUTPUT statement.This results in the mean SBP and DBF in the output data set having the same variable names as the individual variables in the original data set as listed on the VAR statement,namely SBP and DBF.This is fine in a simple application like this, but in general, new variables should have new unique names. It makes everything clearer.

Also, if you output more than one statistic,you obviously have to give them new names ~ you can't use SBP for both the mean and the stan deviation, for example. The resulting output data set, MEANOUT,contains the variables YEAR, SUBJ (the CLASS variables), SBP, DBF,_TYPE_,and _FREQ_.If you had used a PROC PRINT statement to list this data set, you would have obtained the following output:

Listing of data set Meanout

Why is this output data set so skimpy compared to previous examples? Where are the values for the overall population, and for each individual subject over years, as well as for each individual year over subjects? The answer is the NWAY option.

You use this option to tell the system to produce output for only the highest level of class interactions. Here, this results in means for each SUBJ-YEAR combination only (_TYPE_ = 3.) The lower level _TYPE_ values are not included in the output data set.The combination of the NOPRINT and NWAY options makes a very powerful and frequently used data production tool.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status

SAS Programming Topics