# Computing Totals and Using PROC MEANS to Create a Summary Data Set - SAS Programming

This example shows the default printed output from PROC MEANS and how you can use the procedure to create a SAS data set that contains summary information. Suppose you have a SAS data set SALES which contains sales figures for a mail order company. Each record in the data set represents the sale of a single item.

The variables are PO_NUM (purchase order number),ITEM (item description),REGION (region of the country where the item was sold),PRICE (selling price of the item), and QUANTITY (number of items sold).

A listing of all the observations in the data set follows: We use the SALES data set to demonstrate the use of the CLASS statement as well as the two variables _FREQ_ and _TYPE_ that are automatically included in any output data set created by PROC MEANS.In this example, you want to see the summary statistics of the QUANTITY sold (included in the VAR statement) broken down by REGION and ITEM.

One way to do this is to use PROC SORT to pre-sort the data set by REGION and ITEM, and then add a BY statement to the PROC MEANS code.A more efficient approach is to use a CLASS statement to specify the two categorizing variables.This does not require a separate procedure to perform the sort.In this example you also want to create an output data set which contains the totals (SUM) of the quantities for the various REGION-ITEM categories. Here is the code:

Example

PROC MEANS DATA=SALES; O
TITLE 'Sample Output from PROC MEANS';
CLASS REGION ITEM; ©
VAR QUANTITY? Q
OUTPUT OUT«QUAN SUM SUM»TOTALj
RUN; ® • •
PROC PRINT DATA*QUAN_SUMj
TITLE 'Summary Data Set';
RUN?

The output from the above program is shown in two sections: first the output from PROC MEANS,then some discussion,then the output from PROC PRINT.

Output from Example - Computing Totals and Using PROC MEANS to Create a Summary Data Set (PROC MEANS Output) The format of the report,including the output statistics and the maximum number of decimal places to use,can be fairly customized. If you do not ask specifically for certain statistics to be produced, as in this example 0,the procedure automatically gives you the following: N, Mean, Standard Deviation, Minimum, and Maximum.

When you ask for the data to be broken down by category as you did here with the CLASS statement ©, PROC MEANS also throws in number of observations (N Obs).N is the number of non-missing data points and N Obs is the total number of observations per group or sub-group.

In the present case, these numbers are equal because there are no missing data (an easy goal to accomplish when you're making up the data, but not always that easy in real life.) Although the above report is quite useful as is, PROC MEANS can also be used to produce an output data set containing the summary statistics instead of a printed report.

The rest of the examples in this chapter do not produce a report directly but, instead, create only an output data set. How do you produce this output data set? With an OUTPUT statement of course.0 The OUT= option © in the OUTPUT statement specifies the name of the output data set you want to create.

Various statistics can be included in this data set (we asked for SUM only) @, depending on the OUTPUT statement statistics you choose.Typical OUTPUT statistics are N=, MEAN=, and SUM=.Following each of these keywords is a list of variables to be included in the newly created data set that will contain the values for the N, MEAN, SUM,etc.for each of the variables in the VAR list ©.

The statistics available for the OUTPUT statement are the same as those available for the PROC MEANS statement itself.The creation of the output data set does not produce printed output.In order to see the resulting data set, you add a PRINT procedure which produces the following listing:

Output from Example - Computing Totals and Using PROC MEANS to Create a Summary Data Set (PROC PRINT Output) Output data sets produced by PROC MEANS contain a wealth of information.In addition to the actual summary data values, the procedure automatically produces the variables_TYPE_ and _FREQ_,which can be used to identify the population or sub-population contributing to the statistic. Let's use the actual data to explain these. Th first observation 0 has a _TYPE_ = 0 and represents the entire population.The value of TOTAL here (143) is the sum of QUANTITY for all regions and all items. The_FREQ_ variable shows you that there are 11 observations (purchase orders) that contribute to this sum.The next 3 lines (_TYPE_ =1) give the sums for each level of the last (rightmost) CLASS variable, ITEM, across regions.© Here,_FREQ_ shows how many orders were placed for each ITEM, and TOTAL tells how many of each ITEM were sold in the entire country (across regions).

Following these are 4 lines (_TYPE_ = 2) which represent the sums for each level of the next rightmost CLASS variable, REGION, for all items in each region.© Finally, the remaining lines (_TYPE_ = 3) are the totals for each combination of all the CLASS variables,REGION and ITEM.® While this is fairly clear from the listing, if you are up on your binary numbers you'll recognize that the _TYPE_ value, if expressed in binary notation, shows which variables contribute to each line of information,and how they contribute.

The following figure should make this clear. For each of the _TYPE_ values, if there is a 1 under a variable in the binary listing,the data are presented for each discrete value of the variable; if there is a 0 under the variable,the observation is summed (or meaned, etc.)

over that variable.The _TYPE_ = 0 observation therefore represents the SUM for all items and all regions,the _TYPE_ = 1 observations give the sums for each item (across regions), the _TYPE_ = 2 observations show the sums for each region (over all items), and the _TYPE_ = 3 observations contain the sums for each ITEM-REGION combination. It's not really that daunting once you get the hang of it.

SAS Programming Topics