Loading Mechanisms - Data Warehousing

You can use the following mechanisms for loading a data warehouse:

  • Loading a Data Warehouse with SQL*Loader
  • Loading a Data Warehouse with External Tables
  • Loading a Data Warehouse with OCI and Direct-Path APIs
  • Loading a Data Warehouse with Export/Import

Loading a Data Warehouse with SQL*Loader
Before any data transformations can occur within the database, the raw data must become accessible for the database. One approach is to load it into the database. Transportation in Data Warehouses, discusses several techniques for transporting data to an Oracle data warehouse. Perhaps the most common technique for transporting data is by way of flat files.

SQL*Loader is used to move data from flat files into an Oracle data warehouse. During this data load, SQL*Loader can also be used to implement basic data transformations. When using direct-path SQL*Loader, basic data manipulation, such as datatype conversion and simple NULL handling, can be automatically resolved during the data load. Most data warehouses use direct-path loading for performance reasons.

The conventional-path loader provides broader capabilities for data transformation than a direct-path loader: SQL functions can be applied to any column as those values are being loaded. This provides a rich capability for transformations during the data load. However, the conventional-path loader is slower than direct-path loader. For these reasons, the conventional-path loader should be considered primarily for loading and transforming smaller amounts of data.

The following is a simple example of a SQL*Loader controlfile to load data into the sales table of the sh sample schema from an external file sh_sales.dat. The external flat file sh_sales.dat consists of sales transaction data, aggregated on a daily level. Not all columns of this external file are loaded into sales. This external file will also be used as source for loading the second fact table of the sh sample schema, which is done using an external table:

The following shows the controlfile (sh_sales.ctl) to load the sales table:

It can be loaded with the following command:

$ sqlldr sh/sh control=sh_sales.ctl direct=true

Loading a Data Warehouse with External Tables
Another approach for handling external data sources is using external tables. Oracle's external table feature enables you to use external data as a virtual table that can be queried and joined directly and in parallel without requiring the external data to be first loaded in the database. You can then use SQL, PL/SQL, and Java to access the external data.

External tables enable the pipelining of the loading phase with the transformation phase. The transformation process can be merged with the loading process without any interruption of the data streaming. It is no longer necessary to stage the data inside the database for further processing inside the database, such as comparison or transformation. For example, the conversion functionality of a conventional load can be used for a direct-path INSERT AS SELECT statement in conjunction with theSELECT from an external table.

The main difference between external tables and regular tables is that externally organized tables are read-only. No DML operations (UPDATE/INSERT/DELETE) are possible and no indexes can be created on them.

External tables are a mostly compliant to the existing SQL*Loader functionality and provide superior functionality in most cases. External tables are especially useful for environments where the complete external source has to be joined with existing database objects or when the data has to be transformed in a complex manner. For example, unlike SQL*Loader, you can apply any arbitrary SQL transformation anduse the direct path insert method.

You can create an external table named sales_transactions_ext, representing the structure of the complete sales transaction data, represented in the external file sh_sales.dat. The product department is especially interested in a cost analysis on product and time. We thus create a fact table named cost in the sales history schema. The operational source data is the same as for the sales fact table. However, because we are not investigating every dimensional information that is provided, the data in the cost fact table has a coarser granularity than in the sales fact table, for example, all different distribution channels are aggregated. We cannot load the data into the cost fact table without applying the previously mentioned aggregation of the detailed information, due to the suppression of some of the dimensions.

The external table framework offers a solution to solve this. Unlike SQL*Loader, where you would have to load the data before applying the aggregation, you can combine the loading and transformation within a single SQL DML statement, as shown in the following. You do not have to stage the data temporarily before inserting into the target table.

The object directories must already exist, and point to the directory containing the sh_sales.dat file as well as the directory containing the bad and log files.

The external table can now be used from within the database, accessing some columns of the external data only, grouping the data, and inserting it into the costs fact table:

Loading a Data Warehouse with OCI and Direct-Path APIs
OCI and direct-path APIs are frequently used when the transformation and computation are done outside the database and there is no need for flat file staging.

Loading a Data Warehouse with Export/Import
Export and import are used when the data is inserted as is into the target system. No complex extractions are possible.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Data Warehousing Topics