DENORMALIZATION - IBM Mainframe

No major application will run in Third Normal Form, This is probably as heretical a statement as can be made in the face of modern relational technology, but it needs to be said. Perhaps, as CPUs get faster and parallel-processing architecture is better exploited, this will no longer be true; more likely the size of major applications will also increase. Demand for information and analysis will probably continue to outpace the ability of machines to process it in a fully normalized fashion. Now, before cries for another inquisition begins this need to be explained. The issue of normalization has several components. This section does not challenge the relational model or the process of Normalization, which are an excellent and rational method of analyzing data and its fundamental relationships. This section challenges the following fallacies:

  • Normalization completely rationalizes data.
  • Normalization accurately maps how humans work with data.
  • Normalized data is the best representation of data.
  • Data stored non-redundantly will be accessed faster than data stored many times.
  • Normalized tables are the best way to store data.
  • Referential integrity requires fully normalized tables.

Normalization is simply a method to analyze elements of data and their relationships and the relational model is the theoretical superstructure that supports the process. Together, these provide a way of viewing the world of data. But they are not the only correct or useful ways to view the data. In fact, in a complex set of relationships, even the third normal form becomes insufficient rather quickly. Higher forms have been conceived to cope with these more difficult relations, although they are not used outside of academic. Theorists readily acknowledge that these also fail completely model reality.

Nevertheless, both the sufficiency of the relational model and the necessity of the third normal form have become so sacrosanct in some circles, that some vendors are even beginning to enforce the third normal form in their data dictionary. You won't be allowed to violate it—never mind that the application may not run! But many database vendors take a much more practical approach. They, while acknowledging the genius of the relational model, even to the degree of obeying Codd's rules, also provide tools that permit developers to use their brains and make their own decisions about Normalization, referential integrity, procedural language access, non-set-oriented processing and other heretical techniques. In the real world, with live data, demanding users and real demands on performance and ease of use, this flexibility is fundamental to success. Normalization is analysis, not design. Design encompasses issues, particularly related to performance, ease of use, maintenance, and straightforward completion of business tasks, that are unaccounted for in Normalization.

When analyzing the data of a business, normalizing the data to at least the third normal form assures that each non-key column in each table is dependent only on the whole primary key of the table. If the data relationships are complex enough, normalizing to a higher form does a fine, if not complete, job of giving you a deep understanding of data and of relations between the various elements that must be protected and sustained as the application and database are built.

For a major application, however, or even a simple application where tasks do not readily map to fully normalized tables, once the analysis is complete, the design process may need to denormalize some of the tables in order to build an application that is responsive, maps to the user's tasks and will actually complete its processing in the time windows available.

It is the process of increasing redundancy in the database either for convenience or to improve performance. However, proper denormalization takes place after a model has been fully normalized. Denormalization is usually done for convenience and to improve performance. Denormalization of frequently performed operations or calculations can be used to improve performance. For example in accounting packages there will be the daily transaction table, which contains the details of each and every transaction for every head of accounts. But usually there will be a monthly transaction summary, which will only have the monthly aggregate of all the transactions under each head of account. The data contained in the monthly transaction table is completely redundant; because everything can be derived form the daily transaction table using SQL statements. But the monthly summary information is very frequently used and calculating it each and every time will slow down the application resulting in poor response times. Unfortunately, the updates to the daily transaction table are not automatically reflected in the summary table, unless database triggers are created to ensure that data integrity is maintained. So before doing a denormalization, you should make sure that the costs of maintenance do not outweigh the benefits.

It is very important to understand that the design of data structures will have a profound impact on the performance and structure of the front-end application and reports. Although it is possible to 'work around' a poor data model with a great deal of edge, it is not desirable to do so. Ideally an application should start with a fully normalized data structure, which will serve as a firm foundation.

However, any recommendations attempting to produce better response will have to differ from application to application. They will also differ over time as query optimization methods improve, and as more and more CPU power is pushed out to the peripheral devices in the computer system and network. Benchmarking your application on your system is the only way to truly optimize your database.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

IBM Mainframe Topics