Introducing the Microsoft Clustering Algorithm - Data Mining

The Microsoft Clustering algorithm finds natural groupings inside your data when these groupings are not obvious. Another way to put this is to say that it finds the hidden variable that accurately classifies your data. For example, you may be part of a large group of people picking up bags at the baggage claim. You notice that a significant percentage of the travelers are wearing shorts and sporting tans, whereas the rest are bundled up in sweaters and coats. You deduce a hidden variable — that one group returned from a tropical clime, and the other group arrived from some cold, wet place.

This capacity for determining the common thread that holds people together makes clustering a popular data mining technique for marketing. You could use clustering to learn more about your customers to target your message to specific groups. For example, a movie retailer may find a group of customers that purchases family movies on a regular basis and another that purchases documentaries less cluster centers. Based on this model, you can assign each data point to a cluster, and subsequently set the cluster centers to the mean of the data in each cluster. The second chart shows the new cluster centers and borders after performing this operation. Repeat this operation until the data stops moving between clus frequently. Sending monthly coupons for Disney films to the latter group obviously wouldn’t be a wise choice. The ability to define and identify your market segments gives you a powerful tool to drive your business.

Identifying natural groups in your data frees you from simply analyzing your business based on the existing organization. Otherwise, you are limited to the groups that you can imagine, which may not have any bearing on how your customers contribute to your business. Do I sell more family favorites or documentaries? Does more profit come from the Northwest or Southeast region? Are renters better for my business than buyers are? There is an almost limitless number of ways to group your data, and very few (if any) will provide any deep insight into your business. The organization hidden inside your data is a powerful tool for business analysis. A retailer who knows the groups her customers fall into can track sales to those groups on a regular basis.

Figure shows revenue for the movie retailer by region — a typical method of organizing and analyzing sales data. This view shows a healthy growth in business and how each region contributes to that growth. Not all regions are equal, and there are slight differences in the growth for each region. However, there is not much actionable information here — what can you do to increase the overall revenue of your company?

Figure shows revenue divided by clusters automatically found in the retailer’s data. You still see the same growth in the company overall, but now you have a completely different breakdown of that information. The retailer has done an excellent job of catering to the Frequent Viewers customers, but revenue has decreased in the Family Buyers and Single Moviegoers segments. Breaking down your revenue this way gives you the actionable information you need to affect our business. Where are those Family Buyers going? How can you get them back? Do you worry about the Single Moviegoers, or do yousacrifice that business to focus on segments that contribute more to the bottom line? Clustering finds the hidden dimension that is unique to your data and your data alone — that provides information in a way that is impossible to achieve with the predefined organizational methods typically employed to examine your data.

Quarterly revenue by region

Quarterly revenue by region

Quarterly revenue by cluster

Quarterly revenue by cluster


All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Data Mining Topics