Why OLE DB for Data Mining? - Data Mining

Although some of the data mining technologies have existed since the 1960s, the term data mining is relatively new. Before the OLE DB for Data Mining (OLE DB for DM) API was introduced in July 2000, the data mining market was very fragmented. You can compare it with the database market in the 1970s, before relational concepts existed. There were no standard concepts for mining models, model training, and making predictions. For many people, data mining was just a set of algorithms, just as in the old days people thought databases were no more than hierarchical data structures for storing data. Data mining was a high-end tool available to only those Ph.D.s in statistics or machine learning, not for database developers.

There have been many data mining products on the market since late 1990s. Each of these independent data mining software vendors (ISVs) has its own proprietary way of building data mining applications. Most data mining packages on the market include their own algorithms, their own storage formats for model patterns, their own data-cleansing tools, and even their own reporting tools. Data mining has been an isolated software package, not part of the data warehouse.

Besides lacking standard concepts for data mining, there has been no standard programming API. It has been very difficult to integrate the result of data mining with user applications to close the analysis loop. Most data mining products don’t have APIs. It is very painful to integrate data mining features with many business applications. Some data mining products generate source code for decision trees or neural networks. The generated code includes trained parameters of models, for example, the coefficients for neural networks. To deploy a mining model, this source code needs to be compiled and linked with user applications. As a consequence, data mining projects are totally vendor locked. If you choose product A for a data mining project, and later you find that product B has a better time series algorithm, you have to start the project from the beginning, since different products have different tools for transforming data, different formats for storing models, and different APIs for integrating them with user applications.

The goal of OLE DB for Data Mining is to define common concepts and common APIs for the data mining world, similar to what SQL has done in the database world. The API should be easily understood by most database developers, not only by those with a Ph.D. in statistics. In July 1999, OLE DB for Data Mining was launched by Microsoft with many data mining ISVs. One year later, OLE DB for Data Mining API Version 1.0 was finalized and published on Microsoft Web site. The OLE DB for Data Mining API defines common data mining concepts such as mining models, model training, model content, model prediction, and so on. OLE DB for Data Mining also defines a query language for data mining. The syntax of this query language is similar to SQL. Since the standard was published, a number of data mining vendors, including Microsoft, Megaputer, Angoss, KXEN, and DBMiner, have developed their own OLE DB for Data Mining providers. User applications can connect to different data mining providers through OLE DB or ADO connections, as illustrated in Figure 2.3. Each OLE DB for DM provider has a set of data mining algorithms. These algorithms can access any tabular source data through OLE DB. Source data can be stored in various formats, such as relational databases, OLAP cubes, text files, and email documents.

OLE DB DM architecture overview

OLE DB DM architecture overview

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Data Mining Topics