Naïve Bayes algorithm - Data Mining

The Naïve Bayes algorithm in conjunction with the viewers provided in SQL Server Analysis Services 2005 provides a very effective way to explore your data. Since the processing phase of the algorithm merely counts the first-order correlations between the inputs and the outputs, you really don’t have to worry about picking the “correct” inputs, and you can simply throw anything you’ve got at it. This does not hold true when using the algorithm for predictive purposes. When building a predictive model with Naïve Bayes, you must take care that the input attributes are relatively independent. For example, if input Aand input B always have the same value, this would have the effect of multiplying the weight of input A by two, which is something you generally want to avoid. Because of this behavior, it is particularly important to evaluate to accuracy of your model with holdout data using the lift chart .Typically, although Naïve Bayes can be a powerful predictor, many people use more sophisticated algorithms such as decision trees or neural networks for prediction when available.

Exploring a Naïve Bayes model will tell you how your attributes are related to each other in ways that aren’t easily discovered when using other methods. Using the previous example of congressional voting records, you can easily see what the most important votes are for each party. You can see how votes on a particular act are distributed across party lines. You can even see how votes on an act are distributed across the votes of every other act and how they are related to each other.

This ability to explore the relationships between attributes can be applied to many problems. What are the differences between my satisfied and unsatisfied customers? What factors are related to defects in my production line? What differentiates weekly and monthly movie renters? This ability can be combined with the concept of nested tables to provide a further realm of insights. What’s the difference between people who bought the movie Fargo and those who didn’t? How are all my products related? Naïve Bayes provides quick and understandable answers to all of these questions.

Because Naïve Bayes is a rather simple algorithm, issuing DMX commands to Naïve Bayes models is completely standard. The only issue to keep in mind is that the Microsoft_Naive_Bayes implementation supports only discrete attributes, so any continuous input or output columns must be discretized to be consumed by the algorithm. Creating a Naïve Bayes model with continuous columns will result in an error.
Given the exploratory nature of the Naïve Bayes algorithm, it is often useful to create ad hoc data mining models on arbitrary sets of data. For instance to create the voting model described previously, you would issue a statement such as:

You would then train the model with a standard INSERT INTO statement. If all the columns in the source table are the same as those in the model and in the same order as the model, you can simplify the INSERT INTO statement by not specifying any column names, like this:

At this point you can use the model for prediction or browsing. For predicting, you use a standard SELECT statement with a PREDICTION JOIN clause. For example, the following statement uses parameters to predict party affiliation based on the Food Stamps and Nuclear Waste issues:

The result will be based on the values specified in the parameters.

Understanding Naïve Bayes Content
Naïve Bayes content is laid out in four levels. The first level is simply the model itself. The children of the model are the output nodes. Each output node has for its children the entire set of input attributes with a dependency probability higher than the MINIMUM_DEPENDENCY_PROBABILITY parameter. Finally, each input node has a child for each state the input can take, with the distributions of the output attribute states. This arrangement is displayed in Figure.

Naïve Bayes content hierarchy

Naïve Bayes content hierarchy

Fortunately for many content-browsing purposes, there are user-defined functions that can condense the Naïve Bayes content into a somewhat more useful form. The Attribute Characteristics View and the Attribute Discrimination View described later in this chapter both receive their data through builtin user-defined functions that you can use as well.GetPredictableAttributes returns the list of predictable attributes for a specified model along with the NODE_ID for each attribute. CALL

GetPredictableAttributes(‘Voting Records’)

After you have the list of attributes, you can call GetAttribute- Characteristics to return a table describing the characteristics of a value of an attribute. This function takes the attribute’s node ID, the value of interest, a value type flag, and a threshold value along with the model name and returns an ordered list of attributes and values that correlate with the selected attribute value along with the strength of the correlation. The value type flag tells the function if the value you are specifying is a value from the model or the intrinsic “missing” value. Setting the value type to 1 indicates that the value of interest is a known state of the attribute — for example, Yes or No. Setting it to 0 indicates that the value is the intrinsic “missing” value, which occurs when the attribute does not appear in a case, when it is NULL, or when the specific value is removed from the model by feature selection. The threshold indicates the minimum correlation strength returned by the function and is used to limit the number of returned rows. Acall to get the characteristics of Democrats from the Voting Records model would look like this:

CALL GetAttributeCharacteristics(‘Voting Records’, ‘10000000i’,‘Democrat’, 1, 0.0005)

A similar function, GetAttributeDiscrimination, takes two values of an attribute and returns an ordered list of attributes along with the strength with which they differentiate the two values. Negative strength numbers indicate that the attribute value pair on the row favors the first specified value, while positive strength numbers indicate that the second value is favored. Similar to GetAttributeCharacteristics, a value type has to be specified for each value; however, an additional value, 2, can be specified to indicate that you want to compare a value against all other possible values. For example, a query to compare Democrats against all other possible parties would look like this:

CALL GetAttributeDiscrimination(‘Voting Records’, ‘10000000i’,‘Democrat’, 1, ‘All other states’, 2, 0.0005)

To compare Democrats and Republicans, you would issue the following query:

CALL GetAttributeDiscrimination(‘Voting Records’, ‘10000000i’,‘Democrat’, 1, ‘Republican’, 1, 0.0005)

Exploring a Naïve Bayes Model

When exploring a Naïve Bayes model, it is easier to think of the process as simply exploring your data. Since the Naïve Bayes algorithm does not perform any kind of advanced analysis on your data, the views into the model really are simply a new way of looking at the data you always had. The Naïve Bayes viewer contains four views. SQL Server Data Mining provides four different views on Naïve Bayes models that help provide insight into your data. The viewer is accessed through either the BI Development Studio or SQL Management Studio by right-clicking on the model and selecting “Browse.” The views are:

  • Dependency Net
  • Attribute Profiles
  • Attribute Characteristics
  • Attribute Discrimination

Dependency Net
The first tab of the Naïve Bayes viewer is the dependency net. The dependency net (see Figure 4.3) provides a quick display of how all of the attributes in your model are related. Each node in the graph represents an attribute, whereas each edge represents a relationship. If a node has an outgoing edge, as indicated by the arrow, it is predictive of the attribute in the node at the end of the edge. Likewise, if a node has an incoming edge, it is predicted by the other node. Edges can also be bidirectional, indicating that the attributes in the corresponding nodes predict and are predicted by each other.

You can easily hone in on the attributes that interest you by using the Find Node feature. Clicking the Find Node button provides a list of all attributes in the graph or hidden. Selecting a node from the list will cause the node to become selected in the graph. Selected nodes are highlighted and all connected nodes are highlighted with a color representing their relationship with the selection. Figure 4.3 shows a portion of the dependency net for the Congressional Voting model with the Party node selected. From this view, it is easy to see the relationships that Party has with the other attributes in the model. In addition to displaying the relationships and their directions, the dependency net can also tell you the strength of those relationships. Moving the slider from top to bottom will filter out the weaker links, leaving the strong relationships.

Attribute Profiles
The second tab, the Attribute Profile viewer, provides you with an exhaustive report of how each input attribute corresponds to each output attribute one attribute at a time. At the top of the Attribute Profile viewer, you select which output to look at, and the rest of the view shows how all of the input attributes correlated to the states of the output attribute. Figure shows the attribute profiles for the party attribute. You can see that the Abortion Non-Discrimination Act vote was approximately even, with Republicans voting Yeah and Democrats Nay. At the same time, you can see the almost unanimous support for the Child Abduction Prevention Act.

Naive Bayes Dependency Net viewer with the Party node selected

Naive Bayes Dependency Net viewer with the Party node selected

Attribute profiles for the party attribute

Attribute profiles for the party attribute

You can also use this view to organize your data to be presented the way you see fit. You can rearrange columns by clicking and dragging on their headers, or you can even remove a column altogether by right-clicking the column header and selecting Hide Column. Additionally, if the alphabetical order doesn’t suit you, simply click the header for the attribute state you are interested in, and the row ordering changes based on how important that attribute is in predicting that state.

Attribute Characteristics
The third tab allows you to select an output attribute and value and shows you a description of the cases where that attribute and value occur. Essentially, this provides answers to the question “what are people who _____ like?” For example, Figure 4.5 shows the characteristics of Democrats. You can see that these representatives in general voted No on the health care, class action, and rental purchase acts, but voted Yes on the Child Abduction Act. When viewing the attribute characteristics, there are two issues you should keep in mind. First, an attribute characteristic does not imply predictive power. For instance, if most representatives voted for the Child Abduction Prevention Act, then it is likely to characterize Republicans as well as Democrats. Second, inputs that fall below the minimum node score set in the algorithm parameters are not displayed.

Characteristics of attributes, values, and probability

Characteristics of attributes, values, and probability

Attribute Discrimination
The last tab, Attribute Discrimination, provides the answers to the most interesting question — what is the difference between A and B? With this viewer, you choose the attribute you are interested in and select the states you want to compare, and the viewer displays a modified tornado chart indicating which factors favor each state. Figure 4.6 shows the results distinguishing Republicans and Democrats. Republicans tended to vote for most issues, while Democrats voted against them. When reading this view, you also need to take care in your interpretation. It is not implied that no Democrats voted for the Death Tax Repeal Act, rather that these factors favor one group over the other.
When interpreting this view, you have to be careful to consider the support level of the attribute before making judgments. Figure 4.7 shows the discrimination between Independents and all other congresspersons. Looking at this figure, you could say that a strong differentiator between Independents and Democrats is the support for the Low Cost Healthcare Act. Unfortunately, you would be wrong. When examining the Mining Legend for that issue, you see that there are actually only two Independents in your data set. Obviously, it is not prudent to make conclusions based on such limited support.

Distinguishing between Republicans and Democrats

Distinguishing between Republicans and Democrats

Discrimination between Independents and Congresspersons

Discrimination between Independents and Congresspersons

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd Protection Status

Data Mining Topics