Building OLAP Mining Models Using Wizards and Editors - Data Mining

You have just learned about the basic concepts of OLAP, the relationship of OLAP and data mining, and a few business problems for which data mining techniques can be applied to OLAP cubes. In this section, you build a few typical mining models on a cube. The best tools for creating OLAP mining models are the Data Mining Wizard (DMW) and the Data Mining Editor. Note that you can only build an OLAP mining model based on a cube located in the same Analysis Services database.

Using the Data Mining Wizard
The Data Mining Wizard is a handy and powerful tool to help you build data mining models. There are two branches in the DMW based on the types of the data source, the relational branch and the multidimensional branch. In earlier topic “OLE DB for Data Mining” we showed you how to build mining models using DMW on relational tables. Now, you will use the other half of the wizard to build a few OLAP mining models. The source cube is the Sales cube of FoodMart database, as described in the previous sections of the chapter.

Building the Customer Segmentation Model
The first model to build is about customer segmentation. You want to group customers based on their demographics (Occupation, Marital Status, Member Card, and so forth) as well as the total Store Sales for the segmentation. The demographics are dimension attributes, and Store Sales is a measure that contains the aggregated value of each transaction for each customer. In this model, the case ‘table’ is the Customer dimension and there is no nested table.

To invoke DMW, you simply use the context menu by right -clicking the Mining Model folder in the Solution Explorer window of BI Development Studio. Since the source data is a cube, you select the From existing cube option in the Select the Definition Method page, as illustrated in Figur. After selecting the cube as data source, you also select Microsoft Clustering algorithm as the mining technique for this model.

The next step is to specify the case key. For the OLAP mining model, a case key is a dimension attribute, and the dimension that contains the case key is the Case dimension. The case key may or may not be a dimension key. For example, you can select Last Name as case key. You can even choose Gender as case key, ; as a result, you will have only two cases for the training data, Male and Female. In this example, as because you want to segment all customers, you choose Customer, the dimension key, as the case key.

Select From existing cube as data source

Select From existing cube as data source

Select the case key

Select the case key

After specifying the case key, the DMW asks you to select case-level attributes from a set of related attributes and measures, as displayed in Figure. In this case, you select Member Card, Marital Status, Yearly Income, Occupation, Num Car Owned, Num Children At Home, and a measure, Store Cost. Because a clustering algorithm doesn’t require a predictable attribute, you specify all the selected columns as input in the Specify Mining Column Usage Wizard page. By now, you have defined the mining model. All you need to do is name the model at the last page of the wizard and process the model.

Creating a Market Basket Model
Market basket analysis is a popular data mining task. In this example, you will do some market basket analysis of customer purchases in Washington state based on Sales cube. The algorithm to use for this task is Microsoft Association Rules.

Before starting to build the model, you need to define the basket and item. Since the model is for analyzing customer purchase behavior, the unit of a basket is a customer. A customer buys a set of products, and the purchase details are modeled as a nested table. In an OLAP mining model, a case table is mapped to a case dimension, and a nested table is mapped to a measured group dimension. The dimension, which serves as lookup table for the measure group, is called a nested dimension. In this example, the case dimension is the Customer dimension and the nested dimension is Product.

You use DMW to define the model. Similarly to the Customer Segmentation model in previous example, you choose Customer as the Case dimension and Customer attribute as the case key. You don’t select any attribute other than the case key in the case level (You may select properties in the Customer dimension if needed). After selecting the case key, the mining structure is displayed in Figure.

Now, you add the nested table by clicking the Add Nested Table button shown in Figure (Specify mining model column usage). Apop-up wizard appears, as displayed in Figure (Adding the nested table key). This pop-up wizard is used to add nested tables for the mining structure. In this example, your nested dimension is Product, and the nested key is the dimension key Product. You can also choose non-dimension-key attributes such as Brand to be the nested key; in this case, you would analyze the relationship among product brands, not individual products.

Specify mining model column usage

Specify mining model column usage

Adding the nested table key

Adding the nested table key

Since this model is purely for analyzing the relationships among products, you don’t need to include any other attributes in the nested table, neither do you need to add other nested tables. The mining structure is displayed in Figure. The nested table is named as the measure group by default, and the nested key for your model is Product. You specify the nested table as predictable.

Mining structure for market basket analysis

Mining structure for market basket analysis

When the mining structure is defined, the DMW asks you to slice the source cube if necessary. Your market basket model is for analyzing the cross-selling patterns in Washington. You use the cube slice to slice the customers based on the Country ➪ State ➪ City ➪ Customer hierarchy, and use the Filter expressiondrop-down list to select WAat the State level. Now, the model definition is complete; you name the model in the last page of the wizard and process the model.

Creating a Sales Forecast Model
Forecasting is yet another important data mining task. Acube usually contains a time dimension. It is natural for cube users (those who consume cube information through queries and reports) to ask the outlook for cube measures, for example, how many bottles of red wine will be sold in Washington stores over the next three months? How much revenue can each store make over the next two years?

In this example, you build a mining model using Microsoft Time Series on the Sales cube. The objective of the model is to do monthly forecast about the Unit Sales, Store Sales, and Store Cost for each store.

Because the model is for forecasting sales for each store, the case dimension is Store. You choose the dimension key Store to be the case key as displayed in Figure.

Slicing the cube

Slicing the cube

Select Store as case key

Select Store as case key

You learned in Topic “Microsoft Time Series” that there are different ways to model time series data. Since the case dimension Store doesn’t contain time information, the time information is modeled in the nested table. Click the Add Nested Table button in the Mining Model Column Usage page.

You select the Time by Day time dimension to be the nested dimension. Now you need to specify a dimension attribute to be the nested key. You can choose any time attribute, for example, Quarter. In this case, your model will give the quarterly forecast for each store. Since your forecast time unit is month, you specify the dimension attribute The Month as the nested key, which is modeled as the key time.

In the Select Nested Table Columns page, shown in Figure , you select all the measures to be both input and predictable. As you learned, one of the unique features of the Microsoft Time Series algorithm is cross-prediction among series. For example, if the previous month’s Store Sales has a strong correlation with this month’s Store Cost, the algorithm will discover this pattern and use Store Sales to forecast Store Cost.

Mining model columns

Mining model columns

Specify The Month as nested table key

Specify The Month as nested table key

Select measures as time series

Select measures as time series

Figure displays the mining structure for the forecasting model. The case key is Store, the nested key is Month, and the three measures are included in the nested table. There are two dozen stores in the Store dimension of the Sales cube, and each store has three selected measures. There is a total of 72 time series in this forecast model.

The mining structure for a time series

The mining structure for a time series

Before creating the forecasting model, you need to specify the time frame for the training data. Acube may contain many years of data, and you don’t need to feed all the historical data for model training. Some data may be too old and won’t have any impact on the forecast. Consider stock data, if a stock ticker has over 10 years of trading history, the initial 5 years of data is mostly likely useless for prediction. You can slice the cube on the time dimension, using only last two years data as displayed in Figure.

Slice the cube — specify time frame

Slice the cube — specify time frame

Using the Data Mining Editor
After defining the mining model in DMW, you find yourself in the Mining Model Editor. We explained how to build relational mining model using the Mining Model Editor. Figure (Data mining editor for the OLAP mining model) displays the Mining Model Editor with the market basket model you just defined on the Sales cube. The editor is very similar to the Relational Mining Model Editor; the difference is in the Structure pane at right. It displays the OLAP cube in a relational way, similar to the right part of Figure. There are two “tables” for this model: the case table at the left represents the case dimension Customer, with all the Customer properties and a set of measures. The nested table at right represents the fact table. It has many more attributes than just original fact table. The original fact table contains a set of foreign keys and measures. The nested table in the editor displays all the dimension attributes under the dimension keys and the set of measures. You can add these dimension attributes to the nested table of your mining structure.

Data mining editor for the OLAP mining model

Data mining editor for the OLAP mining model


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Data Mining Topics