Maintaining Star Schemas - Informatica

When you design a star schema, you create a fact table for information collected over time, such as purchases or transactions. You then build separate dimension tables for related lists of information, such as inventory or shipping methods. Each dimension table has a logical primary or generated composite key to enable access to dimensional data. For example, you might use an inventory part number as a primary key, or create a composite key using a part number and a current flag.

The following figure shows a star schema with four dimension tables and one fact table:

star schema with four dimension tables and one fact table

When you implement a star schema, you decide how to handle updates to the fact and dimension tables. Fact tables change regularly as new information collects. Do you need to keep all existing data in the fact table, or do you want only the most recent version or snapshot?

If you do not need any historical fact information, you can drop or truncate the existing fact table before using a new session in a workflow. To keep historical information in a fact table, you usually append the latest snapshot to the existing table, using a flag such as a load date or session number to identify the most recent snapshot.

Although dimension tables are typically static lists, most dimension tables do change over time. For example, you might need to update an inventory dimension once a month to reflect new or changed part numbers. Since these changes are smaller in magnitude compared to changes in fact tables, these dimensions are known as slowly growing or slowly changing dimensions.

Slowly growing dimensions are dimension tables that have slowing increasing dimension data, without updates to existing dimensions. You maintain slowly growing dimensions by appending new data to the existing table.

Slowly changing dimensions are dimension tables that have slowly increasing dimension data and updates to existing dimensions. When updating existing dimensions, you decide whether to keep all historical dimension data, no historical data, or just the current and previous versions of dimension data.

When you do not need historical information in a slowly growing or slowly changing dimension table, you can drop or truncate the existing table before using a new session in a workflow. However, in some cases, inserting new dimensions and updating existing dimensions can be more efficient than reloading the entire table.

When you need historical information in a dimension table, you decide how to differentiate between current and historical data in the target:

  • To keep a full history, you might version new data by:
  • -Creating a version number and versioning the primary key.
    -Creating a composite key using a current version flag.
    -Creating an effective date range.

  • To keep a partial history, you might keep a current version and a previous version in a single row. You can also enter a timestamp to indicate the most recent update.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Informatica Topics