Other Fun Things - DB2 Using SQL

Randomly Sample Data

One can use the TABLESAMPLE schema to randomly sample rows for subsequent analysis.

Table Sample Syntax

Table Sample Syntax

SELECT ... FROM table name
correrelation name
TABLESAMPLE BERNOULLI (percent)
SYSTEM REPEATABLE (num)

TABLESAMPLE SyntaxNotes

  • The table-name must refer to a real table. This can include a declared global temporary table, or a materialized query table. It cannot be a nested table expression.
  • The sampling is an addition to any predicates specified in the where clause. Under the covers, sampling occurs before any other query processing, such as applying predicates or doing a join.
  • The BERNOUL option checks each row individually.
  • The SYSTEM option lets DB2 find the most efficient way to sample the data. This may mean that all rows on each page that qualifies are included. For small tables, this method often results in an misleading percentage of rows selected.
  • The "percent" number must be equal to or less than 100, and greater than zero. It determines what percentage of the rows processed are returns.
  • The REPEATABLE option and number is used if one wants to get the same result every time the query is run (assuming no data changes). Without this option, each run will be both random and different.

Examples

Sample 5% of the rows in the staff table. Get the same result each time:

Sample rows in two tables

Sample 18% of the rows in the employee table and 25% of the rows in the employee-activity table, then join the two tables together. Because each table is sampled independently, the fraction of rows that join will be much less either sampling rate:

Sample a declared global temporary table, and also apply other predicates:

Sample Views used in Join Examples

Convert Character to Numeric

The DOUBLE, DECIMAL, INTEGER, SMALLINT, and BIGINT functions call all be used to convert a character field into its numeric equivalent:

Convert Character to Numeric - SQL

Not all numeric functions support all character representations of a number. The following table illustrates what's allowed and what's not:

Acceptable conversion values

INPUT STRING COMPATIBLE FUNCTIONS
============ ==========================================
" 1234" DOUBLE, DECIMAL, INTEGER, SMALLINT, BIGINT
" 12.4" DOUBLE, DECIMAL
" 12E4" DOUBLE

Checking the Input

There are several ways to check that the input character string is a valid representation of a number - before doing the conversion. One simple solution involves converting all digits to blank, then removing the blanks. If the result is not a zero length string, then the input must have had a character other than a digit:

Checking for non-digits

WITH temp1 (c1) AS (VALUES ' 123','456 ',' 1 2',' 33%',NULL)
SELECT c1
,TRANSLATE(c1,' ','1234567890') AS c2
,LENGTH(LTRIM(TRANSLATE(c1,' ','1234567890'))) AS c3
FROM temp1;
ANSWER
============
C1 C2 C3
---- ---- --
123 0
456 0
1 2 0
33% % 1
- - -

One can also write a user-defined scalar function to check for non-numeric input, which is what is done below. This function returns "Y" if the following is true:

  • The input is not null.
  • There are no non-numeric characters in the input.
  • The only blanks in the input are to the left of the digits.
  • There is only one "+" or "-" sign, and it is next to the left-side blanks, if any.
  • There is at least one digit in the input.

Now for the code:

Check Numeric function, part 1 of 2

Check Numeric function, part 2 of 2

IF SUBSTR(instr,ctr,1) <> ' ' THEN
SET bgn_blank = 'N';
END IF;
SET ctr = ctr + 1;
END WHILE wloop;
IF found_num = 'N' THEN
SET is_number = 'N';
END IF;
RETURN is_number;
END!
WITH TEMP1 (C1) AS
(VALUES ' 123'
,'+123.45'
,'456 '
,' 10 2 '
,' -.23' ANSWER
,'++12356' ====================
,'.012349' C1 C2 C3
,' 33%' ------- -- ---------
,' ' 123 Y 123.00000
,NULL) +123.45 Y 123.45000
SELECT C1 AS C1 456 N -
,isnumeric(C1) AS C2 10 2 N -
,CASE -.23 Y -0.23000
WHEN isnumeric(C1) = 'Y' +12356 N -
THEN DECIMAL(C1,10,6) .012349 Y 0.01234
ELSE NULL 33% N -
END AS C3 N -
FROM TEMP1! - - -

The CHAR and DIGITS functions can be used to convert a DB2 numeric field to a character representation of the same, but as the following example demonstrates, both functions return problematic output:

CHAR and DIGITS function usage

SELECT d_sal
,CHAR(d_sal) AS d_chr
,DIGITS(d_sal) AS d_dgt
,i_sal
,CHAR(i_sal) AS i_chr
,DIGITS(i_sal) AS i_dgt
FROM (SELECT DEC(salary - 11000,6,2) AS d_sal
,SMALLINT(salary - 11000) AS i_sal
FROM staff
WHERE salary > 10000
AND salary < 12200
)AS xxx ANSWER
ORDER BY d_sal; =========================================
D_SAL D_CHR D_DGT I_SAL I_CHR I_DGT
------- -------- ------ ----- ----- -----
-494.10 -0494.10 049410 -494 -494 00494
-12.00 -0012.00 001200 -12 -12 00012
508.60 0508.60 050860 508 508 00508
1009.75 1009.75 100975 1009 1009 01009

The DIGITS function discards both the sign indicator and the decimal point, while the CHAR function output is (annoyingly) left-justified, and (for decimal data) has leading zeros. We can do better.

Below are three user-defined functions that convert integer data from numeric to character, displaying the output right-justified, and with a sign indicator if negative. There is one function for each flavor of integer that is supported in DB2:

User-defined functions - convert integer to character

Each of the above functions works the same way (working from right to left):

  • First, convert the input number to character using the CHAR function.
  • Next, use the RTRIM function to remove the right-most blanks.
  • Then, concatenate a set number of blanks to the left of the value. The number of blanks appended depends upon the input type, which is why there are three separate functions.
  • Finally, use the RIGHT function to get the right-most "n" characters, where "n" is the maximum number of digits (plus the sign indicator) supported by the input type.

The next example uses the first of the above functions:

Decimal Input

Creating a similar function to handle decimal input is a little more tricky. One problem is that the CHAR function adds leading zeros to decimal data, which we don't want. A more serious problem is that there are many sizes and scales of decimal data, but we can only create one function (with a given name) for a particular input data type.

Decimal values can range in both length and scale from 1 to 31 digits. This makes it impossible to define a single function to convert any possible decimal value to character with possibly running out of digits, or losing some precision.

NOTE: The fact that one can only have one user-defined function, with a given name, per DB2 data type, presents a problem for all variable-length data types - notably character, varchar, and decimal. For character and varchar data, one can address the problem, to some extent, by using maximum length input and output fields. But decimal data has both a scale and a length, so there is no way to make an all-purpose decimal function.

Despite the above, below is a function that converts decimal data to character. It compromises by assuming an input of type decimal(22,2), which should handle most monetary values:

User-defined function - convert decimal to character

The function works as follows:

  • The non-fractional part of the number is converted to BIGINT, then converted to CHAR as previously described.
  • A period (dot) is added to the back of the output.
  • The fractional digits (converted to character using the DIGITS function) are appended to the back of the output.

Below is the function in action:
Convert DECIMAL to CHAR

SELECT d_sal
,char_right(d_sal) AS d_chr
FROM (SELECT DEC(salary - 11000,6,2) AS d_sal
FROM staff
WHERE salary > 10000 ANSWER
AND salary < 12200 ===============
)AS xxx D_SAL D_CHR
ORDER BY d_sal; ------- -------
-494.10 -494.10
-12.00 -12.00
508.60 508.60
1009.75 1009.75

Floating point data can be processed using the above function, as long as it is first converted to decimal using the standard DECIMAL function.

Adding Commas

The next function converts decimal input to character, with embedded comas. It first coverts the value to character - as per the above function. It then steps though the output string, three bytes at a time, from right to left, checking to see if the next-left character is a number. If it is, it insert a comma, else it adds a blank byte to the front of the string:

User-defined function - convert decimal to character - with commas

Convert Timestamp to Numeric

There is absolutely no sane reason why anyone would want to convert a date, time, or timestamp value directly to a number. The only correct way to manipulate such data is to use the provided date/time functions. But having said that, here is how one does it:

Convert Timestamp to number

Selective Column Output

There is no way in static SQL to vary the number of columns returned by a select statement. In order to change the number of columns you have to write a new SQL statement and then rebind. But one can use CASE logic to control whether or not a column returns any data.

Imagine that you are forced to use static SQL. Furthermore, imagine that you do not always want to retrieve the data from all columns, and that you also do not want to transmit data over the network that you do not need. For character columns, we can address this problem by retrieving the data only if it is wanted, and otherwise returning to a zero-length string. To illustrate,

here is an ordinary SQL statement:
Sample query with no column control

Here is the same SQL statement with each character column being checked against a hostvariable. If the host-variable is 1, the data is returned, otherwise a zero-length string:
Sample query with column control

Making Charts Using SQL

Imagine that one had a string of numeric values that one wants to display as a line-bar chart. With a little coding, this is easy to do in SQL:

Make chart using SQL

To create the above graph we first converted the column of interest to an integer field of a manageable length, and then used this value to repeat a single "*" character a set number of times.

One problem with the above query is that we won't know how long the chart will be until we run the statement. This may cause problems if we guess wrongly and we are tight for space, so the next query addresses this issue by creating a chart of known length. To do this, it does the following:

  • First select all of the matching rows and columns and store them in a temporary table.
  • Next, obtain the MAX value from the field of interest. Then covert this value to an integer and divide by the maximum desired chart length (e.g. 20).
  • Finally, join the two temporary tables together and display the chart. Because the chart will never be longer than 20 bytes, we can display it in a 20 byte field.

Now for the code:
Make chart of fixed length

Multiple Counts in One Pass

The STATS table that is defined on page 116 has a SEX field with just two values, 'F' (for female) and 'M' (for male). To get a count of the rows by sex we can write the following:
Use GROUP BY to get counts

Imagine now that we wanted to get a count of the different sexes on the same line of output. One, not very efficient, way to get this answer is shown below. It involves scanning the data table twice (once for males, and once for females) then joining the result.

Use Common Table Expression to get counts


All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

DB2 Using SQL Topics