

This chapter continues our discussion of design issues in relational databases. In general, the goal of a relationaldatabase design is to generate a set of relation schemas that allows us to store information without unnecessary redundancy, yet also allows us to retrieve information easily. One approach is to design schemas that are in an appropriate normal form. To determine whether a relation schema is in one of thedesirable normal forms, we need additional information about the realworld enterprise that we are modelingwith the database. In this chapter,we introduce the notion of functional dependencies. We then define normal forms in terms of functional dependencies and other types of data dependencies.
First Normal Form
The first of the normal forms that we study, first normal form, imposes a very basic requirement on relations; unlike the other normal forms, it does not require additional information such as functional dependencies.A domain is atomic if elements of the domain are considered to be indivisible units. We say that a relation schema R is in first normal form (1NF) if the domains of all attributes of R are atomic.
A set of names is an example of a nonatomic value. For example, if the schema of a relation employee included an attribute children whose domain elements are sets of names, the schema would not be in first normal form. Composite attributes, such as an attribute address with component attributes street and city, also have nonatomic domains.
Integers are assumed to be atomic, so the set of integers is an atomic domain; the set of all sets of integers is a nonatomic domain. The distinction is that we do not normally consider integers to have subparts, but we consider sets of integers to have subparts namely, the integers making up the set. But the important issue is not what the domain itself is, but rather how we use domain elements in our database.
The domain of all integers would be nonatomic if we considered each integer to be an ordered list of digits. As a practical illustration of the above point, consider an organization that assigns employees identification numbers of the following form: The first two letters specify the department and the remaining four digits are a unique number within the department for the employee. Examples of such numbers would be CS0012 andEE1127. Such identification numbers can be divided into smaller units, and are therefore nonatomic. If a relation schema had an attribute whose domain consists of identification numbers encoded as above, the schema would not be in first normal form.
When such identification numbers are used, the department of an employee can be found by writing code that breaks up the structure of an identification number. Doing so requires extra programming, and information gets encoded in the application program rather than in the database. Further problems arise if such identification numbers are used as primary keys:When an employee changes department, the employee’s identification number must be changed everywhere it occurs, which can be a difficult task, or the code that interprets the number would give a wrong result.
The use of set valued attributes can lead to designs with redundant storage of data, which in turn can result in inconsistencies. For instance, instead of the relationship between accounts and customers being represented as a separate relation depositor, a database designer may be tempted to store a set of owners with each account, and a set of accounts with each customer. Whenever an account is created, or the set of owners of an account is updated, the update has to be performed at two places; failure to perform both updates can leave the database in an inconsistent state. Keeping only one of these sets would avoid repeated information, but would complicate some queries. Set valued attributes are also more complicated to write queries with, and more complicated to reason about.
In this chapter we consider only atomic domains, and assume that relations are in first normal form. Although we have not mentioned first normal form earlier, when we introduced the relational model in Relational Model we stated that attribute values must be atomic.
Some types of nonatomic values can be useful, although they should be used with care. For example, composite valued attributes are often useful, and set valued attributes are also useful in many cases, which is why both are supported in the ER model. In many domains where entities have a complex structure, forcing a first normal form representation represents an unnecessary burden on the application programmer, who has to write code to convert data into atomic form. There is also a runtime overhead of converting data back and forth from the atomic form. Support for nonatomic values can thus be very useful in such domains. In fact, modern database systems do support many types of nonatomic values, as we will see in Object oriented Database and Object Relational Databases. However, in this chapter we restrict ourselves to relations in first normal form.
Pitfalls in RelationalDatabase Design
Before we continue our discussion of normal forms, let us look at what can go wrong in a bad database design. Among the undesirable properties that a bad design may have are:
We shall discuss these problems with the help of a modified database design for our banking example: suppose the information concerning loans is kept in one single relation, lending, which is defined over the relation schema
Lendingschema = (branchname, branchcity, assets, customername, loannumber, amount)
Figure shows an instance of the relation lending (Lendingschema). A tuple t in the lending relation has the following intuitive meaning:
Suppose that we wish to add a new loan to our database. Say that the loan is made by the Perryridge branch to Adams in the amount of $1500. Let the loannumber be L31. In our design, we need a tuple with values on all the attributes of Lendingschema. Thus, we must repeat the asset and city data for the Perryridge branch, and must add the tuple to the lending relation. In general, the asset and city data for a branch must appear once for each loan made by that branch.
(Perryridge, Horseneck, 1700000, Adams, L31, 1500)
Sample lending relation.
The repetition of information in our alternative design is undesirable. Repeating information wastes space. Furthermore, it complicates updating the database. Suppose, for example, that the assets of the Perryridge branch change from 1700000 to 1900000. Under our original design, one tuple of the branch relation needs to be changed. Under our alternative design, many tuples of the lending relation need to be changed. Thus, updates are more costly under the alternative design than under the original design. When we perform the update in the alternative database, we must ensure that every tuple pertaining to the Perryridge branch is updated, or else our database will show two different asset values for the Perryridge branch.
That observation is central to understanding why the alternative design is bad.We know that a bank branch has a unique value of assets, so given a branch name we can uniquely identify the assets value. On the other hand, we know that a branch may make many loans, so given a branch name, we cannot uniquely determine a loan number. In other words, we say that the functional dependency
branchname → assets
holds on Lendingschema, but we do not expect the functional dependency branchname → loannumber to hold. The fact that a branch has a particular value of assets, and the fact that a branch makes a loan are independent, and, as we have seen, these facts are best represented in separate relations.We shall see that we can use functional dependencies to specify formally when a database design is good. Another problem with the Lendingschema design is that we cannot represent directly the information concerning a branch (branchname, branchcity, assets) unless there exists at least one loan at the branch. This is because tuples in the lending relation require values for loannumber, amount, and customername.
One solution to this problem is to introduce null values, as we did to handle updates through views. Recall, however, that null values are difficult to handle. If we are not willing to deal with null values, then we can create the branch information only when the first loan application at that branch is made. Worse, we would have to delete this information when all the loans have been paid.Clearly, this situation is undesirable, since, under our original database design, the branch information would be available regardless of whether or not loans are currently maintained in the branch, and without resorting to null values.
Functional Dependencies
Functional dependencies play a key role in differentiating good database designs from bad database designs. A functional dependency is a type of constraint that is a generalization of the notion of key.
Basic Concepts
Functional dependencies are constraints on the set of legal relations. They allow us to express facts about the enterprise that we are modeling with our database.
In Entity Relationship Model, we defined the notion of a superkey as follows. Let R be a relation schema. A subset K of R is a superkey of R if, in any legal relation r(R), for all pairs t1 and t2 of tuples in r such that t1 ≠ t2, then t1[K] ≠ t2[K]. That is, no two tuples in any legal relation r(R) may have the same value on attribute set K. The notion of functional dependency generalizes the notion of superkey. Consider a relation schema R, and let α ⊆ R and β ⊆ R. The functional dependency
α →β
holds on schema R if, in any legal relation r(R), for all pairs of tuples t1 and t2 in r such that t1[α] = t2[α], it is also the case that t1[β] = t2[β]. Using the functionaldependency notation, we say that K is a superkey of R if K → R. That is, K is a superkey if, whenever t1[K] = t2[K], it is also the case that
t1[R] = t2[R] (that is, t1 = t2).
Functional dependencies allow us to express constraints that we cannot express with superkeys. Consider the schema Loaninfoschema = (loannumber, branchname, customername, amount) which is simplification of the Lendingschema that we saw earlier. The set of functional dependencies that we expect to hold on this relation schema is
loannumber →amount
loannumber →branchname
We would not, however, expect the functional dependency
loannumber →customername
to hold, since, in general, a given loan can be made to more than one customer (for example, to both members of a husband–wife pair). We shall use functional dependencies in two ways:
If we wish to constrain ourselves to relations on schema R that satisfy a set F of functional dependencies, we say that F holds on R.
Let us consider the relation r of Figure , to see which functional dependencies are satisfied. Observe that A → C is satisfied. There are two tuples that have an A value of a1. These tuples have the same C value namely, c1. Similarly, the two tuples with an A value of a2 have the same C value, c2. There are no other pairs of distinct tuples that have the same A value. The functional dependency C → A is not satisfied, however. To see that it is not, consider the tuples t1 = (a2, b3, c2, d3) and t2 = (a3, b3, c2, d4). These two tuples have the same C values, c2, but they have different A values, a2 and a3, respectively. Thus, we have found a pair of tuples t1 and t2 such that t1[C] = t2[C], but t1[A] ≠ t2[A].
Sample relation r.
Many other functional dependencies are satisfied by r, including, for example, the functional dependency AB → D. Note that we use AB as a shorthand for {A,B}, to conform with standard practice. Observe that there is no pair of distinct tuples t1 and t2 such that t1[AB] = t2[AB]. Therefore, if t1[AB] = t2[AB], it must be that t1 = t2 and, thus, t1[D] = t2[D]. So, r satisfies AB → D.
Some functional dependencies are said to be trivial because they are satisfied by all relations. For example, A → A is satisfied by all relations involving attribute A. Reading the definition of functional dependency literally, we see that, for all tuples t1 and t2 such that t1[A] = t2[A], it is the case that t1[A] = t2[A]. Similarly, AB → A is satisfied by all relations involving attribute A. In general, a functional dependencyof the form α → β is trivial if β ⊆ α. To distinguish between the concepts of a relation satisfying a dependency and a dependency holding on a schema, we return to the banking example. If we consider the customer relation (on Customerschema) in Figure , we see that customerstreet → customercity is satisfied. However, we believe that, in the real world, two cities can have streets with the same name. Thus, it is possible, at some time, to have an instance of the customer relation in which customerstreet → customercity is not satisfied.
So, we would not include customerstreet→ customercity in the set of functional dependencies that hold on Customerschema.
The customer relation.
The loan relation.
In the loan relation (on Loanschema) of Figure , we see that the dependency loannumber → amount is satisfied. In contrast to the case of customercity and customerstreet in Customerschema, we do believe that the realworld enterprise that we are modeling requires each loan to have only one amount. Therefore, we want to require that loannumber→amount be satisfied by the loan relation at all times. In other words, we require that the constraint loannumber → amount hold on Loanschema.
In the branch relation of Figure, we see that branchname → assets is satisfied, as is assets → branchname. We want to require that branchname → assets hold on Branchschema. However, we do not wish to require that assets → branchname hold, since it is possible to have several branches that have the same asset value. In what follows, we assume that, when we design a relational database, we first list those functional dependencies that must always hold. In the banking example, our list of dependencies includes the following:
The branch relation.
Closure of a Set of Functional Dependencies
It is not sufficient to consider the given set of functional dependencies. Rather, we need to consider all functional dependencies that hold.We shall see that, given a set F of functional dependencies, we can prove that certain other functional dependencies hold.We say that such functional dependencies are “logically implied” by F.
More formally, given a relational schema R, a functional dependency f on R is logically implied by a set of functional dependencies F on R if every relation instance r(R) that satisfies F also satisfies f.
Suppose we are given a relation schema R = (A, B, C, G, H, I) and the set of functional dependencies
A →B
A →C
CG→ H
CG→ I
B → H
The functional dependency
A→ H
is logically implied. That is, we can show that, whenever our given set of functional dependencies holds on a relation, A→H must also hold on the relation. Suppose that t1 and t2 are tuples such that
t1[A] = t2[A]
Since we are given that A→B, it follows from the definition of functional dependency that
t1[B] = t2[B]
Then, since we are given that B → H, it follows from the definition of functional dependency that
t1[H] = t2[H]
Therefore, we have shown that, whenever t1 and t2 are tuples such that t1[A] = t2[A], it must be that t1[H] = t2[H]. But that is exactly the definition of A → H. Let F be a set of functional dependencies. The closure of F, denoted by F+, is the set of all functional dependencies logically implied by F. Given F, we can compute F+ directly from the formal definition of functional dependency. If F were large, this process would be lengthy and difficult. Such a computation of F+ requires arguments of the type just used to show that A → H is in the closure of our example set of dependencies.
Axioms, or rules of inference, provide a simpler technique for reasoning about functional dependencies. In the rules that follow, we use Greek letters (α, β, γ, . . . ) for sets of attributes, and uppercase Roman letters from the beginning of the alphabet for individual attributes. We use αβ to denote α ∪ β.We can use the following three rules to find logically implied functional dependencies.
By applying these rules repeatedly, we can find all of F+, given F. This collection of rules is called Armstrong’s axioms in honor of the person who first proposed it.
Armstrong’s axioms are sound, because they do not generate any incorrect functional dependencies. They are complete, because, for a given set F of functional dependencies, they allow us to generate all F+. The bibliographical notes provide references for proofs of soundness and completeness.
Although Armstrong’s axioms are complete, it is tiresome to use them directly for the computation of F+. To simplify matters further, we list additional rules. It is possible to use Armstrong’s axioms to prove that these rules are correct.
Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the set F of functional dependencies {A → B, A → C, CG → H, CG → I, B → H}. We list several members of F+ here:
Another way of finding that AG → I holds is as follows. We use the augmentation rule on A → C to infer AG → CG. Applying the transitivity rule to this dependency and CG → I, we infer AG → I.
Figure shows a procedure that demonstrates formally how to use Armstrong’s axioms to compute F+. In this procedure, when a functional dependency is added to F+, it may be already present, and in that case there is no change to F+. The lefthand and righthand sides of a functional dependency are both subsets of R. Since a set of size n has 2n subsets, there are a total of 2 × 2n = 2n+1 possible functional dependencies, where n is the number of attributes in R. Each iteration of the repeat loop of the procedure, except the last iteration, adds at least one functional dependency to F+. Thus, the procedure is guaranteed to terminate.
Closure of Attribute Sets
To test whether a set α is a superkey, we must devise an algorithm for computing the set of attributes functionally determined by α. One way of doing this is to compute F+, take all functional dependencies with α as the lefthand side, and take the union of the righthand sides of all such dependencies.However, doing so can be expensive, since F+ can be large.
F+ = FFigure A procedure to compute F+.
An efficient algorithm for computing the set of attributes functionally determined by α is useful not only for testing whether α is a superkey, but also for several other tasks, as we will see later in this section.
Let α be a set of attributes.We call the set of all attributes functionally determined by α under a set F of functional dependencies the closure of α under F; we denote it by α+. Figure shows an algorithm, written in pseudocode, to compute α+. The input is a set F of functional dependencies and the set α of attributes. The output is stored in the variable result.
To illustrate how the algorithm works, we shall use it to compute (AG)+ with the functional dependencies .We start with result = AG. The first time that we execute the while loop to test each functional dependency, we find that
The second time that we execute the while loop, no new attributes are added to result, and the algorithm terminates.
Let us see why the algorithm of Figure is correct. The first step is correct, since α→α always holds (by the reflexivity rule).We claim that, for any subset β of result, α→β. Sincewe start the while loop with α→result being true, we can add γ to result only if β ⊆ result and β → γ. But then result → β by the reflexivity rule, so α → β by transitivity. Another application of transitivity shows that α → γ (using α → β and β → γ). The union rule implies that α → result ∪ γ, so α functionally determines any new result generated in the while loop. Thus, any attribute returned by the algorithm is in α+.
It is easy to see that the algorithm finds all α+. If there is an attribute in α+ that is not yet in result, then there must be a functional dependency β → γ for which β ⊆ result, and at least one attribute in γ is not in result.It turns out that, in the worst case, this algorithm may take an amount of time quadratic in the size of F. There is a faster (although slightly more complex) algorithm that runs in time linear in the size of F;
result := α;
while (changes to result) do
for each functional dependency β → γ in F do
begin
if β ⊆ result then result := result ∪ γ;
end
Figure An algorithm to compute α+, the closure of α under F.
There are several uses of the attribute closure algorithm:
Canonical Cover
Suppose that we have a set of functional dependencies F on a relation schema. Whenever a user performs an update on the relation, the database system must ensure that the update does not violate any functional dependencies, that is, all the functional dependencies in F are satisfied in the new database state.
The system must roll back the update if it violates any functional dependencies in the set F. We can reduce the effort spent in checking for violations by testing a simplified set of functional dependencies that has the same closure as the given set. Any database that satisfies the simplified set of functional dependencies will also satisfy the original set, and vice versa, since the two sets have the same closure. However, the simplifiedset is easier to test.We shall see how the simplified set can be constructed in a moment. First, we need some definitions.
An attribute of a functional dependency is said to be extraneous if we can remove it without changing the closure of the set of functional dependencies. The formal definition of extraneous attributes is as follows. Consider a set F of functional dependencies and the functional dependency α →β in F.
For example, suppose we have the functional dependencies AB → C and A → C in F. Then, B is extraneous in AB → C. As another example, suppose we have the functional dependencies AB → CD and A → C in F. Then C would be extraneous in the righthand side of AB → CD.
Beware of the direction of the implications when using the definition of extraneous attributes: If you exchange the lefthand side with righthand side, the implication will always hold. That is, (F − {α → β}) ∪ {(α − A) → β} always logically implies F, and also F always logically implies (F − {α → β}) ∪ {α → (β − A)}
Here is how we can test efficiently if an attribute is extraneous. Let R be the relation schema, and let F be the given set of functional dependencies that hold on R. Consider an attribute A in a dependency α → β.


Database system concepts Related Tutorials 


Python Tutorial  Teradata Tutorial 
Adv Java Tutorial  SQL Database Tutorial 
JBOSS Tutorial  JSON (JavaScript Object Notation) Tutorial 
All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.