# SQL Components - DB2 Using SQL

DB2 Objects
DB2 is a relational database that supports a variety of object types. In this section we shall overview those items which one can obtain data from using SQL.

Table
A table is an organized set of columns and rows. The number, type, and relative position, of the various columns in the table is recorded in the DB2 catalogue. The number of rows in the table will fluctuate as data is inserted and deleted. The CREATE TABLE statement is used to define a table. The following example will define the EMPLOYEE table, which is found in the DB2 sample database.

DB2 sample table – EMPLOYEE

View
A view is another way to look at the data in one or more tables (or other views). For example, a user of the following view will only see those rows (and certain columns) in the EMPLOYEE table where the salary of a particular employee is greater than or equal to the average salary for their particular department.

SELECT from a view that has its own data

DB2 sample view - EMPLOYEE_VIEW
A view need not always refer to an actual table. It may instead contain a list of values:

Define a view using a VALUES clause

We can go one step further and define a view that begins with a single value that is then manipulated using SQL to make many other values. For example, the following view, when selected from, will return 10,000 rows. Note however that these rows are not stored anywhere in the database - they are instead created on the fly when the view is queried.

Define a view that creates data on the fly

Alias
An alias is an alternate name for a table or a view. Unlike a view, an alias can not contain any processing logic. No authorization is required to use an alias other than that needed to access to the underlying table or view.

Define three aliases, the latter on the earlier

Neither a view, nor an alias, can be linked in a recursive manner (e.g. V1 points to V2, which points back to V1). Also, both views and aliases still exist after a source object (e.g. a table) has been dropped. In such cases, a view, but not an alias, is marked invalid.

Nickname
A nickname is the name that one provides to DB2 for either a remote table, or a non-relational object that one wants to query as if it were a table.

Define a nickname

Table sample
Use of the optional TABLESAMPLE reference enables one to randomly select (sample) some fraction of the rows in the underlying base table:

TABLE SAMPLE example

DB2 Data Types
DB2 comes with the following standard data types:

• SMALLINT, INT, and BIGINT (i.e. integer numbers).
• FLOAT, REAL, and DOUBLE (i.e. floating point numbers).
• DECIMAL and NUMERIC (i.e. decimal numbers).
• CHAR, VARCHAR, and LONG VARCHAR (i.e. character values).
• GRAPHIC, VARGRAPHIC, and LONG VARGRAPHIC (i.e. graphical values).
• BLOB, CLOB, and DBCLOB (i.e. binary and character long object values).
• DATE, TIME, and TIMESTAMP (i.e. date/time values).
• XML (i.e. contains well formed XML data).

Below is a simple table definition that uses some of the above data types:

Sample table definition

In the above table, we have listed the relevant columns, and added various checks to ensure that the data is always correct. In particular, we have included the following:

• The sales# is automatically generated (see page 257 for details). It is also the primary key of the table, and so must always be unique.
• The payment-type must be one of two possible values.
• Both the sales-value and the num-items must be greater than zero.
• The employee# must already exist in the staff table. Furthermore, once a row has been inserted into this table, any attempt to delete the related row from the staff table will fail.

Default Lengths
The following table has two columns:

Table with default column lengths

The length has not been provided for either of the above columns. In this case, DB2 defaults to CHAR(1) for the first column and DECIMAL(5,0) for the second column.

Data Type Usage
In general, use the standard DB2 data types as follows:

• Always store monetary data in a decimal field.
• Store non-fractional numbers in one of the integer field types.
• Use floating-point when absolute precision is not necessary.

A DB2 data type is not just a place to hold data. It also defines what rules are applied when the data in manipulated. For example, storing monetary data in a DB2 floating-point field is a no-no, in part because the data-type is not precise, but also because a floating-point number is not manipulated (e.g. during division) according to internationally accepted accounting rules.

Date/Time Arithmetic
Manipulating date/time values can sometimes give unexpected results. What follows is a brief introduction to the subject. The basic rules are:

• Multiplication and division is not allowed.
• Subtraction is allowed using date/time values, date/time durations, or labeled durations.
• Addition is allowed using date/time durations, or labeled durations.

The valid labeled durations are listed below:

Labeled Durations and Date/Time Types

LABELED DURATIONS ITEM WORKS WITH DATE/TIME<------------------------> FIXED <---------------------> SINGULAR PLURAL SIZE DATE TIME TIMESTAMP =========== ============ ===== ==== ==== ========= YEAR YEARS N Y - Y MONTH MONTHS N Y - Y DAY DAYS Y Y - Y HOUR HOURS Y - Y Y MINUTE MINUTES Y - Y Y SECOND SECONDS Y - Y Y MICROSECOND MICROSECONDS Y - Y Y

Usage Notes

• It doesn't matter if one uses singular or plural. One can add "4 day" to a date.
• Some months and years are longer than others. So when one adds "2 months" to a date the result is determined, in part, by the date that you began with. More on this below.
• One cannot add "minutes" to a date, or "days" to a time, etc.
• One cannot combine labeled durations in parenthesis: "date - (1 day + 2 months)" will fail. One should instead say: "date - 1 day - 2 months".
• Adding too many hours, minutes or seconds to a time will cause it to wrap around. The overflow will be lost.
• Adding 24 hours to the time '00.00.00' will get '24.00.00'. Adding 24 hours to any other time will return the original value.
• When a decimal value is used (e.g. 4.5 days) the fractional part is discarded. So to add (to a timestamp value) 4.5 days, add 4 days and 12 hours.

Now for some examples:
Example, Labeled Duration usage

ANSWER ========== SELECT sales_date <= 1995-12-31 ,sales_date - 10 DAY AS d1 <= 1995-12-21 ,sales_date + -1 MONTH AS d2 <= 1995-11-30 ,sales_date + 99 YEARS AS d3 <= 2094-12-31 ,sales_date + 55 DAYS - 22 MONTHS AS d4 <= 1994-04-24 ,sales_date + (4+6) DAYS AS d5 <= 1996-01-10 FROM sales WHERE sales_person = 'GOUNOT' AND sales_date = '1995-12-31'

Adding or subtracting months or years can give somewhat odd results when the month of the beginning date is longer than the month of the ending date. For example, adding 1 month to '2004-01-31' gives '2004-02-29', which is not the same as adding 31 days, and is not the same result that one will get in 2005. Likewise, adding 1 month, and then a second 1 month to '2004-01-31' gives '2004-03-29', which is not the same as adding 2 months. Below are some examples of this issue:

Date/Time Duration Usage
When one date/time value is subtracted from another date/time value the result is a date, time, or timestamp duration. This decimal value expresses the difference thus:

Date/Time Durations

DURATION-TYPE FORMAT NUMBER-REPRESENTS USE-WITH-D-TYPE ============= ============= ===================== =============== DATE DECIMAL(8,0) yyyymmdd TIMESTAMP, DATE TIME DECIMAL(6,0) hhmmss TIMESTAMP, TIME TIMESTAMP DECIMAL(20,6) yyyymmddhhmmss.zzzzzz TIMESTAMP

Below is an example of date duration generation:

Date Duration Generation

A date/time duration can be added to or subtracted from a date/time value, but it does not make for very pretty code:

Subtracting a Date Duration

Date/Time Subtraction
One date/time can be subtracted (only) from another valid date/time value. The result is a date/time duration value.

DB2 Special Registers
A special register is a DB2 variable that contains information about the state of the system. The complete list follows:

DB2 Special Registers

SPECIAL REGISTER UPDATE DATA-TYPE =============================================== ====== ============= CURRENT CLIENT_ACCTNG no VARCHAR(255) CURRENT CLIENT_APPLNAME no VARCHAR(255) CURRENT CLIENT_USERID no VARCHAR(255) CURRENT CLIENT_WRKSTNNAME no VARCHAR(255) CURRENT DATE no DATE CURRENT DBPARTITIONNUM no INTEGER CURRENT DEFAULT TRANSFORM GROUP yes VARCHAR(18) CURRENT DEGREE yes CHAR(5) CURRENT EXPLAIN MODE yes VARCHAR(254) CURRENT EXPLAIN SNAPSHOT yes CHAR(8) CURRENT FEDERATED ASYNCHRONY yes INTEGER CURRENT IMPLICIT XMLPARSE OPTION yes VARCHAR(128) CURRENT ISOLATION yes CHAR(2) CURRENT LOCK TIMEOUT yes INTEGER CURRENT MAINTAINED TABLE TYPES FOr OPTIMIZATION yes VARCHAR(254) CURRENT PACKAGE PATH yes VARCHAR(4096) CURRENT PATH yes VARCHAR(254) CURRENT QUERY OPTIMIZATION yes INTEGER CURRENT REFRESH AGE yes DECIMAL(20,6) CURRENT SCHEMA yes VARCHAR(128) CURRENT SERVER no VARCHAR(18) CURRENT TIME no TIME CURRENT TIMESTAMP no TIMESTAMP CURRENT TIMEZONE no DECIMAL(6,0) CURRENT USER no VARCHAR(128) SESSION_USER yes VARCHAR(128) SYSTEM_USER no VARCHAR(128) USER yes VARCHAR(128)

Usage Notes

Some special registers can be referenced using an underscore instead of a blank in the name - as in: CURRENT_DATE.

• Some special registers can be updated using the SET command (see list above).
• All special registers can be queried using the SET command. They can also be referenced in ordinary SQL statements.
• Those special registers that automatically change over time (e.g. current timestamp) are always the same for the duration of a given SQL statement. So if one inserts a thousand rows in a single insert, all will get the same current timestamp.
• One can reference the current timestamp in an insert or update, to record in the target table when the row was changed. To see the value assigned, query the DML statement.

Sample SQL
Using Special Registers

Distinct Types
A distinct data type is a field type that is derived from one of the base DB2 field types. It is used when one wants to prevent users from combining two separate columns that should never be manipulated together (e.g. adding US dollars to Japanese Yen). One creates a distinct (data) type using the following syntax:

Create Distinct Type Syntax

The following source types do not support distinct types: LOB, LONG VARCHAR, LONG VARGRAPHIC, and DATALINK. The creation of a distinct type, under the covers, results in the creation two implied functions that can be used to convert data to and from the source type and the distinct type. Support for the basic comparison operators (=, <>, <, <=, >, and >=) is also provided.
Below is a typical create and drop statement:

Create and drop distinct type

A distinct type cannot be dropped if it is currently being used in a table.

Usage Example
Imagine that we had the following customer table:

Sample table, without distinct types

One problem with the above table is that the user can add the American and European sales values, which if they are expressed in dollars and euros respectively, is silly:

Silly query, but works

To prevent the above, we can create two distinct types:

Create Distinct Type examples

Now we can define the customer table thus:

Sample table, with distinct types

(id INTEGER NOT NULL ,fname VARCHAR(00010) NOT NULL WITH DEFAULT '' ,lname VARCHAR(00015) NOT NULL WITH DEFAULT '' ,date_of_birth DATE ,citizenship CHAR(03) ,usa_sales USA_DOLLARS ,eur_sales EUR_DOLLARS ,sales_office# SMALLINT ,last_updated TIMESTAMP ,PRIMARY KEY(id));

Now, when we attempt to run the following, it will fail:

Silly query, now fails

The creation of a distinct type, under the covers, results in the creation two implied functions that can be used to convert data to and from the source type and the distinct type. In the next example, the two monetary values are converted to their common decimal source type, and then added together:

Silly query, works again

SELECT Statement
A SELECT statement is used to query the database. It has the following components, not all of which need be used in any particular query:

• SELECT clause. One of these is required, and it must return at least one item, be it a column, a literal, the result of a function, or something else. One must also access at least one table, be that a true table, a temporary table, a view, or an alias.
• WITH clause. This clause is optional. Use this phrase to include independent SELECT statements that are subsequently accessed in a final SELECT
• ORDER BY clause. Optionally, order the final output
• FETCH FIRST clause. Optionally, stop the query after "n" rows .If an optimize- for value is also provided, both values are used independently by the optimizer.
• READ-ONLY clause. Optionally, state that the query is read-only. Some queries are inherently read-only, in which case this option has no effect.
• FOR UPDATE clause. Optionally, state that the query will be used to update certain columns that are returned during fetch processing.
• OPTIMIZE FOR n ROWS clause. Optionally, tell the optimizer to tune the query assuming that not all of the matching rows will be retrieved. If a first-fetch value is also provided, both values are used independently by the optimizer. Some of the more interesting options are described below.

SELECT Statement Syntax (general)

SELECT Clause
Every query must have at least one SELECT statement, and it must return at least one item, and access at least one object.

SELECT Statement Syntax

SELECT Items

• Column: A column in one of the table being selected from.
• Literal: A literal value (e.g. "ABC"). Use the AS expression to name the literal.
• Special Register: A special register (e.g. CURRENT TIME).
• Expression: An expression result (e.g. MAX(COL1*10)).
• Full Select: An embedded SELECT statement that returns a single row.

FROM Objects

• Table: Either a permanent or temporary DB2 table.
• View: A standard DB2 view.
• Alias: A DB2 alias that points to a table, view, or another alias.
• Full Select: An embedded SELECT statement that returns a set of rows.

Sample SQL
Sample SELECT statement

To select all of the columns in a table (or tables) one can use the "*" notation:
Use "*" to select all columns in table

To select both individual columns, and all of the columns (using the "*" notation), in a single SELECT statement, one can still use the "*", but it must fully-qualified using either the object name, or a correlation name:

Select an individual column, and all columns

Use the following notation to select all the fields in a table twice:

Select all columns twice

FETCH FIRST Clause
The fetch first clause limits the cursor to retrieving "n" rows. If the clause is specified and no number is provided, the query will stop after the first fetch.

Fetch First clause Syntax

If this clause is used, and there is no ORDER BY, then the query will simply return a random set of matching rows, where the randomness is a function of the access path used and/or the physical location of the rows in the table:

FETCH FIRST without ORDER BY, gets random rows

WARNING: Using the FETCH FIRST clause to get the first "n" rows can sometimes return an answer that is not what the user really intended. See below for details.

If an ORDER BY is provided, then the FETCH FIRST clause can be used to stop the query after a certain number of what are, perhaps, the most desirable rows have been returned. However, the phrase should only be used in this manner when the related ORDER BY uniquely identifies each row returned.

To illustrate what can go wrong, imagine that we wanted to query the STAFF table in order to get the names of those three employees that have worked for the firm the longest - in order to give them a little reward (or possibly to fire them). The following query could be run:

FETCH FIRST with ORDER BY, gets wrong answer

The above query answers the question correctly, but the question was wrong, and so the answer is wrong. The problem is that there are two employees that have worked for the firm for ten years, but only one of them shows, and the one that does show was picked at random by the query processor. This is almost certainly not what the business user intended. The next query is similar to the previous, but now the ORDER ID uniquely identifies each row returned (presumably as per the end-user's instructions):

FETCH FIRST with ORDER BY, gets right answer

The correlation name is defined in the FROM clause and relates to the preceding object name. In some cases, it is used to provide a short form of the related object name. In other situations, it is required in order to uniquely identify logical tables when a single physical table is referred to twice in the same query. Some sample SQL follows:

Correlation Name usage example

Correlation name usage example

Renaming Fields
The AS phrase can be used in a SELECT list to give a field a different name. If the new name is an invalid field name (e.g. contains embedded blanks), then place the name in quotes:

Renaming fields using AS

The new field name must not be qualified (e.g. A.C1), but need not be unique. Subsequent usage of the new name is limited as follows:

• It can be used in an order by clause.
• It cannot be used in other part of the select (where-clause, group-by, or having).
• It cannot be used in an update clause.
• It is known outside of the full-select of nested table expressions, common table expressions, and in a view definition.

View field names defined using AS

Working with Nulls
In SQL something can be true, false, or null. This three-way logic has to always be considered when accessing data. To illustrate, if we first select all the rows in the STAFF table where the SALARY is < $10,000, then all the rows where the SALARY is >=$10,000, we have not necessarily found all the rows in the table because we have yet to select those rows where the SALARY is null. The presence of null values in a table can also impact the various column functions. For example, the AVG function ignores null values when calculating the average of a set of rows.This means that a user-calculated average may give a different result from a DB2 calculated equivalent:

AVG of data containing null values

Null values can also pop in columns that are defined as NOT NULL. This happens when a field is processed using a column function and there are no rows that match the search criteria:

Getting a NULL value from a field defined NOT NULL

Why Nulls Exist
Null values can represent two kinds of data. In first case, the value is unknown (e.g. we do not know the name of the person's spouse). Alternatively, the value is not relevant to the situation (e.g. the person does not have a spouse). Many people prefer not to have to bother with nulls, so they use instead a special value when necessary (e.g. an unknown employee name is blank). This trick works OK with character data, but it can lead to problems when used on numeric values (e.g. an unknown salary is set to zero).

Locating Null Values
One can not use an equal predicate to locate those values that are null because a null value does not actually equal anything, not even null, it is simply null. The IS NULL or IS NOT NULL phrases are used instead. The following example gets the average commission of only those rows that are not null. Note that the second result differs from the first due to rounding loss.

AVG of those rows that are not null

Quotes and Double-quotes
To write a string, put it in quotes. If the string contains quotes, each quote is represented by a pair of quotes:
Quote usage

Double quotes can be used to give a name to a output field that would otherwise not be valid. To put a double quote in the name, use a pair of quotes:

Double-quote usage