# Understanding Naive Bayes Principles - Data Mining

The mathematical method proposed by Bayes uses a combination of conditional and unconditional probabilities. At first glance, the formula may seem a bit daunting, but when you break it down into its principal components, it’s really quite easy to understand.
Let’s use the congressional records as an example to build up Bayes’ rule. First, suppose that you had to simply guess the party affiliation of a congressperson during the 2002 congressional sessions without any additional information. Given that there were more Republicans in the House than Democrats that year (51% to 49% in fact), your best guess would be to choose Republican, because it is the most likely choice. In data mining terms, this unconditional probability is called the prior probability of a hypothesis and can be written P(H). In this case P(Republican) = 51% and P(Democrat)= 49%. Additionally, you could increase the likelihood of your guess being correct if you knew the overall voting records of the House members and those of your representative. Table shows the votes by party for selected issues in 2002, and Table shows how the representative in question voted.

Distinguishing congressional parties by their 2002 voting records Voting Data by Party Affiliation Target Representative DEATH HOMELAND HELP CHILD PARTY
The numbers in Figure represent the counts of votes broken down by party affiliation — your target variable. For example, 41 Democrats voted Yeah for the Death Tax Repeal Act and 166 voted Nay. This gives you the percentages in the lower part of the graph: 41/(41 + 166) = 20% Yeah and 166 / (41 + 166) = 80% Nay. The final column of the table provides you with the counts and percentages of Democrats and Republicans overall. The Naïve part of Naïve Bayes tells you to treat all of your attributes as independent of each other with respect to the target variable. This may be a faulty assumption, but it allows you to multiply your probabilities to determine the likelihood of each state. For your representative represented in Table, the likelihood calculation that this person is a Democrat would be:

Likelihood of (D) = 0.2 * 0.57 * 0.94 * 0.89 * 0.49 = 0.0467
Likewise the calculation for Republican would be:
Likelihood of (R) = 0.98* 0.03 * 0.83 * 0.995 * 0.51 = 0.0124

You can instantly see that the representative is almost four times as likely to be a Democrat as a Republican based on this voting behavior. You can convert these likelihoods to probabilities by normalizing them to sum to 1.

Bayes’ Rule states that if you have a hypothesis H and evidence about that hypothesis E, then you can calculate the probility of H using the following formula:

This simply states that the probability of your hypothesis given the evidence is equal to the probability of the evidence given the hypothesis multiplied by the probability of the hypothesis and then normalized. While that seems like a mouthful, let’s apply this to our congressional example.

First, you tackle the probability of the hypothesis given the evidence; in this case, this would be the probability that the representative is a Democrat given that she voted Yeah on the Death Tax Repeal, Help America Vote, and Child Abduction Acts, and Nay on the Homeland Security Act. To determine this probability, you need to compute the probability of the evidence, given that your hypothesis is true. This is simply a lookup from the counts presented in Table 4.1. That is, your evidence states that the representative voted Yeah on the Help America Vote Act and your hypothesis is that the representative is a Democrat. From the table, you see that the probability of this piece of evidence is 94%. The probability of all the evidence given the hypothesis is simply the product of the probabilities of each individual piece. Next, you multiply by the overall probability, the prior probability, of your hypothesis — in this case 49%.

Last, you divide by the probability of the evidence; however, in practice this isn’t necessary. Since you will test all possible hypotheses, both Democrat and Republican, this factor is eliminated when you normalize the results.

Data Mining Topics