Semantics - T-SQL

The words used to describe data concepts are often different, depending a great deal upon the context of the discussion. Data lives in tables. Usually, a table represents some kind of business entity, such as a Product or Customer, for example. Each item in a table is called a row or record. For our purposes, these mean the same thing. I may use these words interchangeably throughout the book. Envision several rows in an Excel worksheet representing different products. Each product has a manufacturer, supplier, packaging quantity, and price. In Excel, these values would be contained in different cells. In a table, separate values are referred to as a column or field. As far as we're concerned, these words have the same meaning as well. How do you decide how data should be organized into tables and columns?
That is the fine art of database design and is often no easy task. To arrive at an optimal database design, you must first have a thorough understanding of the business process and the how data will be used.

So, what is data, really? We often hear the words "information” and "data” used to mean the same thing. In reality, they are very different concepts. As humans, we generally concern ourselves with meaningful information we can use day-to-day. Information has a context — it makes sense to us. If my wife asks me to stop by the store on the way home from work and pick up eggs and milk, I should have enough information to accomplish this simple task. I have a few informational items to contend with in this scenario: the store, eggs, and milk. If we were to ask some people in the database business about these simple things, we might get some interesting (or not so interesting) answers. For example, my friend Greg, a city geographic information systems (GIS) expert employed by the city government, might point out that in his database, the store is a building with an address, property plot number, city zoning record, water, sewer, and electrical service locations. It has latitude and longitude coordinates, a business license, and a tax record. If we were to talk to someone in the grocery business, they might tell us that eggs and milk exist in a products table in their point of sale and inventory management database systems. Each is assigned a product record ID and UPC codes. The product supplier, vendors, shipping companies, and the dairies likely have their own systems and deal with these items in different ways. However, as a consumer, I'm not concerned with such things. I just need to stop by the store and pick up the eggs and milk.

Here's the bottom line: data is just numbers and letters in a database or computer application somewhere. At some point, all that cryptic data was probably useful information until it was entered into the database. For the database designer or programmer, these values may be meaningful. For the rest of us, it isn't useful at all until it gets translated back into something we understand information. Part of the job description of SQL programmers is to retrieve data from one or more databases and turn that data into information. By combining data from multiple entities, or simplistically, tables, the SQL programmer puts context to the data, turning it into information.

Changing Terminology

One of the greatest challenges in our relatively new world of technology is how we use common language to communicate both technical and non-technical concepts. Even when dealing with the same system, terminology often changes as you progress through the different stages of the solution design and construction. These stages are generally as follows:

  • Conceptual or architectural design
  • Logical design
  • Physical design

Conceptual Design

As you approach the subject of automating business processes through the use of databases and software, one of the first and most important tasks is to gather business requirements from users and other business stakeholders. Beginning with non-technical, business, and user-centric language, you must find terms to describe each unit of pertinent information to be stored and processed. A complete unit of information is known as an entity. Business entities generally represent a whole unit that can stand on its own. For example, a customer and a product are examples of entities.

Another conceptual unit of information is an attribute. This unit of information describes a characteristic of an entity. An attribute may be something as simple as the color or price of a product. It could also be something more complex, such as the dimensions of a package. The important thing during conceptual design is to deal with the simple and conceptual aspects and not all of the implementation details. This way you leave your options open to consider creative ways to model and manage the data according to your business requirements.

In most processes, different terms may be used to describe the same or similar concepts. For example, in an order processing environment, the terms customer, shopper, and purchaser could mean the same thing. Under closer evaluation, perhaps a shopper is a person who looks for products and a customer is a person who actually purchases a product. In this case, a shopper may become a customer at some point in the process. In some cases, a customer may not actually be a person. A customer could also be an organization. It's important to understand the distinction between each entity and find agreeable terms to be used by anyone dealing with the process, especially non-technical users and business stakeholders. Conceptual design is very free-form and often takes a few iterations to reveal all of the hidden requirements.

Along with the entity and attribute concepts, another important notion is that of an instance. You may have 100,000 customers on record, but as far as your database system is concerned, these customers don't really exist until you need to deal with their information. Sure, these people do exist out in customer land, but your unfeeling database system couldn't care less about customers who are not currently engaged in buying products, spending money, or updating their billing information. Your system was designed to process orders and purchase products that's it. If a customer isn't involved in ordering, purchasing, or paying, the system pays no attention. When a customer places an order, you start caring about this information and your order processing system needs to do something with the customer information. At this point, your system reaches into the repository of would-be customers and activates an instance of a specific customer. The customer becomes alive just long enough for the system to do something useful with it and then put it back into cold storage, moving on to the next task. Therefore, in our database system a single customer record is an instance of the Customer entity. All instances of an entity have prescribed attributes, but they don't necessarily share all attributes. For example, it could be possible that not all customers have phone numbers, or middle names. During the conceptual design phase it's best to identify what attributes must be shared by all instances of an entity and what ones are optional.

Logical Design

This stage of design is the transition between the abstract, non-specific world of conceptual design and the very specific, technical world of physical design. After gaining a thorough understanding of business requirements in the language of users, this is an opportunity to model the data and the information flow through the system processes. With respect to data, you should be able to use the terms entity, attribute, and instance to describe every unit of data. Contrasted with conceptual design, logical design is more formalized and makes use of diagramming models to confirm assumptions made in conceptual design.

Prototyping is also part of the logical design effort. A quick mock-up database can be used to demonstrate design ideas and test business cases. It's important, though, that prototypes aren't allowed to evolve into the production design. Fredrick P. Brooks said in his book, The Mythical Man Month, "When designing a new kind of system, a team should factor in the fact that they will have to throw away the first system that is built since this first system will teach them how to build the system. The system will then be completely redesigned using the newly acquired insights during building of the first system.” When you finally happen upon a working model, throw it out and start fresh. This gives you the opportunity to design a functional solution without the baggage of evolutionary design. In logical design, you decide what you're going to build and for what purpose.

In particular, logical database design involves the definition of all the data entities and their attributes. For example, you know that a customer entity should have a name, a shipping location, and a line of credit. Although you realize that the customer's name may consist of a first name, middle initial, and last name, this is unimportant in this stage of design. Likewise, the customer's location may consist of a street address, city, state, and zip code; you also leave these details for the physical design stage. The point during this stage is to understand the need and recognize how this entity will behave with other data entities and their attributes.

Physical Design

One of the greatest reasons to have a formal design process is to find all of the system requirements before attempting to build the solution. It has been said, "Requirements are like water. They're easier to build on when they're frozen.” So any attempt to define requirements as you go along instead of before you start will inevitably lead to disastrous results. Ask any seasoned software professional and I guarantee their response will be preceded with either a tear or a smile. Physical design is like drawing the blueprints for a building. It's not a sketch or a rough model. It is the specification for the real project in explicit detail. As your design efforts turn to the physical database implementation, entities may turn into tables and attributes into columns. However, there is not always a one-to-one correspondence between conceptual entities and physical tables. That is the simplification I mentioned earlier. The value of appropriate design is to find similarities and reduce redundant effort. You will likely discover the need for more detail than originally envisioned. For instance, a company may have customers that are individuals, but other customers might be a collection of individuals or a corporation.

I worked on a project for a training center where much more detail was needed than first believed. After careful requirements gathering, we found that students attending a class may have paid for the training themselves or had their entire class paid for by their employer. In addition students might have had to pay part of the cost themselves with their employer covering the rest. Then there were the cases that part of the cost was paid by the employer, part by the student, and another part by state agencies. Who was the customer in this last case? As it turned out, according to the company's business rules, the individual student, the employer, and the state agency were all customers. As a result, in the physical design, entities of customer, transaction, and invoice were designed to encompass situations where more than one customer paid for an individual instance of a course.


Although entity relationships were discussed briefly in Chapter(Introducing T-SQL and Data Management Systems), I want to devote a little more time expounding on the concepts to add clarity to the current topic of design. The purpose of nearly all database systems is to model elements in our physical world. To do this effectively, you need to consider the associations that exist between the various entities you want to keep track of. This concept of an item or multiple items being related to a different item or multiple items is known as cardinality or multiplicity. To illustrate this concept, just look around you. Nearly everything fits into some kind of collection or set of like objects. The leaves on a tree, the passengers in a car, and the change in your pocket are all examples of this simple principle. These are sets of similar objects in a collection and associated with some kind of container or attached to some type of parent entity. In the previous section, I described the somewhat complex scenario of many customers and a single transaction. In that case, a relationship was defined between an instance of a class, an instance of a student, an instance of multiple invoices, and instances of multiple customers. Relationships can be described and discovered using common language. As you describe associations, listen for words such as is, have, and has. For example, a customer has orders. Now turn it around: an order has a customer. By looking at the equation from both sides, you've discovered a one-to-many relationship between customers and orders.

Relationships generally can be grouped into three different types of cardinality:

  • One-to-one
  • One-to-many
  • Many-to-many

The one-to-one and one-to-many relationships are fairly easy to define using a combination of foreign keys and unique constraints, but many-to-many relationships cannot actually be defined using two tables. To reduce redundancy, minimize data storage requirements, and facilitate these relationships, you apply standard rules of normalization (the rules of normal form), which are described briefly in this section.

Primary Keys

According to the rule of first normal form (1NF), which says that each column must contain a single type of information or a single value with no repeating groups of data, it is imperative that each row (or record) be stamped with a unique key value. This key could be either a meaningful value that is useful for other reasons, or a surrogate key, a value generated only for the sake of uniqueness. The uniqueness of a record depends entirely on the primary key. Be very cautious and think twice (or three times) before choosing to use non-surrogate key values. I've designed more than a few database systems where it seemed to make sense to use an intelligent value for the primary key (for example, social security number, address, phone number, product code, and so on) and later wished I had just generated a unique value for the key. Most experienced database folks have horror stories to share about such experiences.

So, what's a surrogate key? Two common forms of surrogate key values exist, although there certainly could be more. The first and probably most common form is the IDENTITY property. A SQL Server IDENTITY property is simply an integer value that is automatically incremented by the database system. In the world of Microsoft Access, the identity function is known as an Auto-Number field. SQL Server's IDENTITY property is more flexible, however, since the integer can be seeded at any supported value and incremented by an integer value. This can serve as a unique value as long as all data is entered into a single instance of the database and uniqueness on the identity value is enforced. In distributed systems consisting of multiple, disconnected databases, it can be a bit challenging and next to impossible to keep these values unique. In these cases another type of automatically generated key can be used. This special data type is called a unique identifier or globally unique identifier (GUID). This SQL data type is equipped to store a large binary value that is automatically generated by the system. A complex algorithm is used to produce a value that is partially random and partially predictable. The result is what I call a big ugly number. It is statistically guaranteed to be unique anytime and anywhere. The emphasis on "statistically guaranteed" is intentional. The reason is that chances of this value being duplicated are astronomically improbable, but not impossible.

To see surrogate keys in action, you need to build a table that will auto-populate field values. We will cover building objects with T-SQL in Chapter (Creating and Managing Database Objects), but to clarify the principle of surrogate keys it is useful to take a sneak peek.

The following code creates a table that uses a unique identifier data type with the NEWID() function along with an integer with an IDENTITY() function to auto-populate field values when new rows are added to the table:


The IDENTITY() function can accept two values to specify which number to start with and which number to increment by. If the values are not specified, the function defaults to starting at one and incrementing by one, which is also what we specified specifically. It's important to note that neither the Identity Column nor the GUID column in the preceding example will enforce uniqueness by default, so it is very important to add some type of unique value enforcement to the table to avoid a duplicate from being manually added. This enforcement takes the form of primary key constraints, unique constraints, and unique indexes. Constraints are covered in more detail in Chapter (Creating and Managing Database Objects).

Now that we have a table that auto-populates fields, let's give it a try. For both SQL Server 2005 and SQL Server 2008 the following code will add two rows of data to the new table:


Foreign Keys

One purpose for keys is to enforce the relationship between the records in one table to those in another table. A column in the table containing related records is designated as a foreign key. This usually means that it contains the same values found in the primary key column(s) of the primary table. However, with SQL Server it could also mean that it contains the same values as those defined by an object known as a unique constraint. The only difference between a primary key and a unique constraint is that all primary key values must be defined, whereas a SQL Server unique constraint allows for a single undefined, or null, value. Unlike a primary key or unique constraint, a foreign key doesn't have to be unique. Using the Customer/Order example, one customer can have multiple orders but one order has only one customer. This describes a one-to-many relationship. The primary key column of the Customer table is related to the foreign key column of the Order table through a relationship known as a foreign key constraint. Later, in Chapter (Creating and Managing Database Objects), you see how this relationship is defined in T-SQL.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status

T-SQL Topics