# Create statistical calculations - IBM Cognos

IBM Cognos Statistics, powered by IBM SPSS, provides analysts with the ability to distribute reports with statistical insight to the larger business community, further expanding the breadth of reporting capabilities provided by IBM Cognos software. Whether you are obtaining additional insight into key business variables or predicting future outcomes, IBM Cognos Statistics provides the necessary fact-based statistical evidence to support key organizational decisions.

Because IBM Cognos Statistics is seamlessly integrated into IBM Cognos Report Studio, analysts no longer need to extract standardized trusted data from their business intelligence (BI) data warehouse into a separate tool to analyze and report on statistical information. Now, analysts can assemble reports containing statistical information easily and distribute the information across the enterprise, saving valuable time. The statistical capabilities that are provided by IBM Cognos Statistics are powered by the trusted market-leading IBM SPSS statistical engine, enabling you to make the most of best-in-class analytics within your organization.

IBM Cognos Statistics is easy to use for existing IBM Cognos Report Studio authors, because it uses IBM Cognos Report Studio objects and provides a convenient wizard interface.

In this section, we introduce and provide a use case of IBM Cognos Statistics.

IBM Cognos Statistics overview

In this section, we provide the functions of IBM Cognos Statistics and sample images that were created from the Great Outdoors sales company data. With these images, we introduce each statistical function.

Descriptive Statistics

Descriptive Statistics quantitatively summarize a data set. For an overall sense of the data being analyzed, you can show descriptive statistics along with more formal analyses. In this section, we describe the types of statistical objects in Descriptive Statistics.

Basic Descriptive Statistics

Descriptive tables describe the basic features of data in quantitative terms:

• Summary descriptive statistics: One value In this table, we use the Salary to Analysis variable and the Employee to Case variable .You can see that the average salary of this company is 49,664.18 and that the standard deviation is extremely high:31189.147.
• Summary descriptive statistics: One value

Notes regarding Figure:
Mean→ The arithmetic mean is the sum of samples divided by the number of cases.Std.Deviation→ A measure of dispersion around the mean.N→The number of cases, observations, or records.Median→Half of the cases fall above the median, and half of the cases fall below the median.
Minimum→The smallest value of a numeric variable.
Maximum→The largest value of a numeric variable.

• Summary descriptive statistics: Multiple values
In the table in Figure, we use the Salary and Bonus value to Analysis variable and the Employee to Case variable. You can see that the standard deviation of Bonus value is lower than the standard deviation of Salary in this company.
• Summary descriptive statistics: Multiple values

• Descriptive statistics by grouping variable

In the table we use the Salary to Analysis variable, Employee to Case variable, and Country to Grouping variable. You can compare the salary of each country.

Descriptive statistics by grouping variable

Histogram

Histograms display the range of variable values in intervals of equal length. You can use a histogram to summarize the frequency of observations graphically. Figure uses the Unit price to Analysis variable, Product to Case variable, and Country to Grouping variable. Unit price distribution regarding all the products of this company displays.

Histogram

Boxplot

A boxplot, which is also known as a “box-and-whisker” chart, is a convenient way to show groups of numerical data, such as these types:

• Minimum and maximum values
• Upper and lower quartiles
• Median values
• Outlying and extreme values

Figure uses the Gross profit to Analysis variable, Retailer name to Case variable, and Region to Grouping variable. You can see the Gross profit distribution of retailers of each region and that VIP Department Stores is an excellent retailer in the U.S.

Boxplot

Q-Q Plot

You can create a quartile-quartile (Q-Q) plot to chart the quartiles of a variable’s distribution against a distribution of your choice, including the normal distribution. Figure uses the Salary to Analysis variable and Employee to Case variable. You can see that several of the high-salaried and low-salaried employees are out of range in the normal distribution in this company.

Q-Q Plot

Means comparison

You can compare the means of two or more groups to determine if the difference between the groups is statistically significant, that is, if the difference is due to something other than random chance. You can use two types of statistical objects in means comparison.

One-Sample t-Test

The One-Sample t-Test tests the probability that the difference between the sample mean and a test value is due to chance. Probabilities of .05 or less are typically considered significant.

In this table, we use the Revenue to Analysis variable, Product to Case variable, Product line to Grouping variables, and 30000000 to Test value. One-Sample t-Test provides two types of results: one type is One-Sample Statistics and the other type is One-Sample Test. Check the Sig. or significance values in the One-Sample Test, and you can see that Camping Equipment, Personal Accessories, and Golf Equipment do not differ significantly compared to the Test value, but Mountaineering Equipment and Outdoor Protection differ significantly.

One-Sample Statistics

One-Sample Test

One-Way ANOVA

You can use One-Way ANOVA to assess whether groups of means differ significantly. ANOVA assumes that there is homogeneity of variance, that is, that the variance within each of the groups is equal. You can check for homogeneity of variance by using the Levene’s test. Probabilities of .05 or less are typically considered significant.

In this table, we use Salary to Dependent variables, the Branch region to Independent variable, and the Employee to Case variable. One-Way ANOVA provides various kinds of results. For example, this test provides these three tables and one chart. The Multiple Comparisons table shows the salary difference for each country.

ANOVA

Multiple Comparisons

Homogeneous subsets

Means Plots

Nonparametric tests

You use nonparametric tests to compare frequencies in categorical data. You test for significant differences between observed frequencies and expected frequencies in data that does not have a normal distribution.

One-Way Chi-Square Test

One-Way Chi-Square Tests, which are also known as chi-square goodness-of-fit tests, compare observed frequencies against expected frequencies using data from a single categorical variable.

In this table, we use the Branch region to Analysis variable, Vacation days taken to count variable, and Employee to Case variable. The One-Way Chi-Square Testprovides the types of results. You can see Central Europe is the region whose employees take the most vacation.

Frequencies by Branch region

Test Statistics

Two-Way Chi-Square Test

Two-Way Chi-Square Tests, which are also known as chi-square tests of independence, compare observed frequencies against expected frequencies using data from two categorical variables.

In this table, we use Level of education to Analysis variable1, Previous defaulted to Analysis variable2, and the Customer ID to Case variable. The Two-Way Chi-Square Test provides various types of results. You can see Pearson Chi-Square is significant (<0.5), which means that there is a significant difference between the default rates of customers with differing levels of education.

Case Processing Summary

Crosstabulation

Chi-Square Tests

Correlation and Regression

Correlation and regression analysis let you examine relationships between variables.

Basic Correlation

Basic Correlation is a measure of association between two variables. The existence of a correlation does not imply causality, but simply helps you to understand the relationship.

This table uses Unit price to Analysis variable1, Quantity to Analysis variable2, and the Product to Case variable. Basic Correlation provides these kinds of results. You can see that the Pearson Correlation is 0.904, which means that there is a positive relationship between Product cost and Gross profit.

Basic Correlation chart

Basic Correlation Descriptive Statistics

Basic Correlation correlations

Linear Regression

Linear Regression examines the relationship between one dependent variable and one or more independent variables. You can use Linear Regression to predict the dependent variable when the independent variables are known.

In this table, we use the Gross profit to Dependent variable, Product cost to Independent variable, and Product to Case variable. Linear Regression provides various types of results The key statistic of interest in the coefficients table is the unstandardized regression coefficient, Product cost 0.528.

The regression equation is as follows:

dependent variable = slope * independent variable + constant

The slope is how steep the regression line is, based on a scatterplot. The constant is where the regression line strikes the y-axis when the independent variable has a value of 0.

In this example, the slope is 0.528, and the constant is 2861822.972. So, the regression equation is the predicted value of Gross profit = 0.528* Product cost + 2861822.972.

Linear Regression Variables Entered/Removed

Linear Regression Model Summary

Linear Regression ANOVA

Linear Regression Coefficients

Curve Estimation

You can use Curve Estimation to plot a curve through a set of points to examine the relationship between one independent variable and one or more dependent variables. The aim of Curve Estimation is to find the best fit for your data, expressed as the correlation coefficient R square.

You can choose one or more curve estimation regression models:

• Linear
• Logarithmic
• Inverse
• Cubic
• Power
• Compound
• S
• Logistic
• Growth
• Exponential

This table uses the Gross profit to Dependent variable, Product cost to Independent variable, and Product to Case variable with Linear model. Curve Estimation provides these kinds of results. You can see differences between the estimated line and the actual value. You can try using separate models with your data to help you find the model with the optimum fit.

Curve Estimation chart

Curve Estimation Model Summary

Curve Estimation ANOVA

Curve Estimation Coefficients

Control Charts

All processes show variation, but excessive variation can produce undesirable or unpredictable results. You use statistical process control (SPC) to monitor critical manufacturing and other business processes that must be within specified limits.

Control Charts plot samples of your process output collected over time to show you whether a process is in control or out of control.

X-Bar

Plot the average of each subgroup. An X-Bar chart is often accompanied by either the R chart or S chart.

X-Bar chart

X-Bar Rule Violations

X-Bar Process Statistics

R charts

R charts plot range values by subtracting the smallest value in a subgroup from the largest value in the same subgroup. The center line on the chart represents the mean of the ranges of all the subgroups.

R chart

S charts

S charts plot the standard deviations for each subgroup. The center line on the chart represents the mean of the standard deviations of all the subgroups.

S chart

Moving Range

Moving Range charts plot the difference between each sample value and the preceding sample value. The center line on the chart represents the average change from one sample to another sample.

Moving Range

Individuals

Individuals charts plot the measured value of each individual sample. The center line on the chart represents the average of all individual samples in the chart

Individuals chart

Individuals Rule Violations

p chart

The p chart plots the percentage of defective units, such as the percent of automobiles with defects per shift. Sample sizes do not need to be equal. They can vary between collection periods.

np chart

The np chart plots the number of defective units, such as the number of automobiles with defects per shift. Sample sizes must be equal.

c chart

The c chart plots the number of defects, such as the total number of defects per shift. Sample sizes must be equal.

u chart

The u chart plots the number of defects per unit, such as the number of defects per automobile per shift. Sample sizes do not need to be equal. They can vary between collection periods.

IBM Cognos Statistics use case:

Create an IBM Cognos Statistics report

“Business scenario and personas used in this book” introduces the Great Outdoors company business scenario. We use IBM Cognos Statistics to answer the following business question for the company:

• How many units of a product should I buy by each period of the year?

The executives wants a summarized report of sales performance that shows the relationship between the sales quantity and inventory of each product.

In this scenario, Ben Hall is the Analyst. He wants to create a sales summary report that shows the statistical relationship between the sales quantity and the inventory in the second quarter (2Q) of 2007 with IBM Cognos Statistics. Also, he wants to show which product is a “pain” point or poor seller.

Create a statistics chart

First, Ben creates a statistics chart. To create the chart:

1. Launch IBM Cognos Report Studio with the Go Data Warehouse (query) package.
2. Click the Insert Table icon, and create a 1 x 2 table.
3. Drag a Curve Estimation statistic object from the Insertable Objects pane to the left side of the table.
4. In the Select Statistic dialog box, expand Correlation and Regression, and click Curve Estimation, as shown in Figure Click OK. In the next window, click Cancel.
5. Select Statistic dialog box

• Insert the following measures and items,
• Quantity to Dependent variable
• Opening inventory to Independent variable
• Product to Cases variable

Insert measures and items

6. Create an advanced filter with Year=2007 and Quarter=’Q2’.
7. Run a report.

Figure shows the result. The red circles show the pain points. These points are much lower than the estimated line, which means that these items had too much inventory compared to their sales quantities.

If you want to identify the item name of the pain point, there is no direct way to identify the pain points at this point. So, you can use the crosstab to identify them.

Curve Estimation chart

Create crosstab to identify pain point

Next, Ben creates a crosstab. To create a crosstab:

1. Before creating a crosstab, make a note about the following information in the statistical report:
• Slope:0.533
• Constant:17297.425
2. From this information, you can recognize the following equation:

predicted value of Quantity = 0.533* Opening inventory + 17297.425

Coefficients

3. Drag a Crosstab object from the Insertable Objects pane to the correct table.
4. Insert Product in the Rows area. Insert Quantity and Opening inventory in the Columns area,
5. Crosstab

6. Create the same advanced filter for the crosstab that you created in the statistic report.
7. Create the estimated column:
8. a. Click Opening inventory, and add a custom calculation.
b. Click *(multiplication) in the operation, type slope value(0.533)in the Number field, and click OK.
c. Click the (Opening inventory * 0.533) column, and add a custom calculation.
d. Click +(addition) in operation, type Constant value(17297.425) in the Number field, and click OK.
e. Delete the (Opening inventory * 0.533) column.
f. Click the ((Opening inventory * 0.533) + 17297.425) cell, and rename it as “Regression” in the property pane.

Crosstab

9. Click the Regression, Quantity cell, and add the calculation (Regression - Quantity).
10. Move (Regression - Quantity) to the right edge of the crosstab.
11. Click the (Regression - Quantity) cell, and set the order as Descending
12. Crosstab

13. Adjust the crosstab location appropriately.
14. Run the report.

In this report, the crosstab shows the order of difference between the estimated line and the actual sales Quantity. You can identify the item name of the pain points with Quantity and Opening inventory values, You can see that these item names are “Glacier Basic” and “Double Edge”.

Statistic report