RELATIONAL DATA INTEGRITY - IBM Mainframe

From the above discussion of the relational data structure, it is evident that most of the relations have an attribute, which can uniquely identify each tuple in the relation. In some cases there can be more than one attribute, which can uniquely identify each tuple in the relation. This attribute is called the candidate key. In other words a candidate key is an attribute that can uniquely identify a row in a table.

ELEMENT Table

ELEMENT Table

In the above table, the attributes symbol, name and atomic number can uniquely identify each row, so any one can be a candidate key, or the ELEMENT_TABLE has three candidate keys. Since the body of a relation is a set and sets by definition do not contain duplicate elements, it follows that at any given time no two tuples (or rows) of a relation can be duplicates of each other (or in other words no two rows can be the same). Let R be the relation with attributes Al, A2 An. The set of attributes K=(Ai,Aj, An) of R is said to be a candidate key of R if and only if the following two properties are satisfied:

  • Uniqueness - At any given time, no two distinct tuples (rows) of R have the same value for Ai, the same value for Aj and the same value for An.
  • Minimality - No proper subset of the set (Ai, Aj, An) has the uniqueness property.

Every relation has at least one candidate key, because at least the combination of all its attributes has the uniqueness property. In the case of base relations (relations of a base table), one candidate key is designated as the primary key and the remaining candidate keys are called alternate keys. For example in the ELEMENT_TABLE, the relation has three candidate keys. We can choose any one of them as the primary key. So if we choose the symbol as the primary key, then the name and atomic number become alternate keys. But there are no hard and fast rules on how to choose the primary key from the list of candidate keys; it is a matter of preference and convenience of the database designer.

The terms candidate keys and primary keys should not be abbreviated to just 'keys'. The term 'key' has too many meanings in the database world. In the relational model alone, there are candidate keys, primary keys, alternate keys, foreign keys, search keys, parent keys, encryption keys, decryption keys, and so on. So it is better to qualify the word 'key' with the appropriate title to avoid confusion. Consider the following two tables. One is the ELEMENT_TABLE and the other the SHIPMENTJTABLE.

SHIPMENT Table

SHIPMENT Table

Let us take a look at the attribute 'Item' of relation SHIPMENTJTABLE. It is clear that a given value for that attribute, say item 'Au' should be permitted to appear in the database only if the same value appears as a value of the primary key 'Symbol' in the relation ELEMENT_TABLE. Such an attribute is called a foreign key. Or in other words, a foreign key is an attribute or attribute combination of one relation (table) whose values are required to match those of the primary key of some other relation (table). Also the foreign key and the primary key should be defined on the same underlying domain.

Primary Key - Foreign Key relationship

Primary Key - Foreign Key relationship

From the above discussions we are now able to identify many integrity rules (or constraints) for the relational model. Relational data model includes several types of constraints whose purpose is to maintain the accuracy and integrity of the data in the database. The major types of integrity constraints are:

  • Domain Constraints
  • Entity Integrity
  • Referential Integrity
  • Operational Constraints

Domain Constraints

All the values that appear in a column of a relation (table) must be taken from the same domain. As we have seen before, a domain is a set of values that may be assigned to an attribute. A domain definition usually consists of the following components:

  • Domain name
  • Meaning
  • Data Type
  • Size or Length
  • Allowable values or Allowable range (if applicable)

For example, in the ELEMENT table, the domain for the column Symbol is a character of length 2, and should be from the list of the elements. In other, words, the domain of the column Symbol is a value whose maximum length is 2 characters and the first letter is in uppercase and it should be a value from the periodic table of elements. Similarly the domain of the column atomic number is an integer and so on.

Entity Integrity

An entity is any person, place, or thing to be recorded in a database. Each table represents an entity, and each row of a table represents an instance of that entity. For example, if order is an entity, the orderstable represents the idea of an order and each row in the table represents a specific order.

To identify each row in a table, the table must have a primary key. The primary key is a unique value that identifies each row. This requirement is called theentity integrity constraint. The entity integrity rule is designed to assure that every relation has a primary key, and that the data values for that primary key are all valid. Entity integrity guarantees that every primary key attribute is non-null. No attribute participating in the primary key of a base relation is allowed to contain nulls. Primary key performs the unique identification function in a relational model. Thus a null primary key value within a base relation would be like saying that mere was some entity that had no known identity. An entity that cannot be identified is a contradiction in terms, hence the name entity integrity. In some cases, a particular attribute cannot be assigned a data value. There are two situations where this is likely to occur:

  • There is no applicable data value
  • Applicable data value is not known when the values are assigned

For example, consider a situation where you are filling out your personal details. There is a column for fax number and you don't have a fax number. You will leave the field blank. This is an example of no applicable data value. In another case, suppose you are filling the ELEMENT table, you do not know the melting point for Nickel. You know that Nickel has a melting point, but you do not know the exact value at that point in time. So you leave that field blank since that information is not known at that point.

The relational model allows you to assign a null value to an attribute in the above-described situations. A null is a value that is assigned to an attribute when no other value applies, or when the applicable value is unknown. In reality, a null is not a value, but rather the absence of a value. For example, null is not the same as 0 (for numeric fields) or blank (for character fields). The inclusion of nulls in the relational model is somewhat controversial, since operations involving nulls sometimes leads to unpredictable results. On the other hand using null for missing values is a good idea. But whatever the pros and cons of using null, it is imperative that the primary key values be non-null.

Referential Integrity

Referential integrity refers to the relationship between tables. Because each table in a database must have a primary key, this primary key can appear in other tables because of its relationship to data within those tables. When a primary key from one table appears in another table, it is called aforeign key. Foreign keys join tables and establish dependencies between tables. Tables can form a hierarchy of dependencies in such a way that if you change or delete a row in one table, you destroy the meaning of rows in other tables.

In the relational data model, associations between tables are defined using foreign keys. For example, the association between the ELEMENT and the SHIPMENT tables is defined by including the Symbol attribute as a foreign key in the SHIPMENT table. This implies that before we insert a new row in the SHIPMENT table, the element for that order must already exist in the ELEMENT table. If you examine the rows in the SHIPMENT table, you will find that every item name in that table appears in the ELEMENT table.

A referential integrity constraint is a rule that maintains consistency among the rows of two tables (relations). The rule states that if there is a foreign key in one relation, either each foreign key value must match a primary key value in the other table or else the foreign key value must be null.

If base relation (table) includes a foreign key- PR matching the primary key PK of some other base relation, then every value of FK in the first table must either be equal to the value of PK in some tuple (row) of the second table or be wholly null (that is each attribute value participating in that FK value must be null). Or in other words, a given foreign key value must have matching primary key value in some tuple of the referenced relation if that foreign key value is non-null. Sometimes, it is necessary to permit foreign keys to accept nulls. Here it must be noted that the null are of the variety 'value does not exist' rather than 'value unknown'.

Operational Constraints

These are the constraints enforced in the database by the business rules or real world limitations. For example, if the retirement age of the employees in an organization is 60, then the age column of the employee table can have a constraint "Age should be less than or equal to 60." These kinds of constraints, enforced by the business and the environment are called operational constraints.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

IBM Mainframe Topics