At times throughout this book, examples are shown using a function within a function and the power it provides. The COUNT aggregate provides another opportunity to demonstrate a capability that might prove itself useful. It combines the DISTINCT and aggregate functions.
The following may be used to determine how many courses are being taken instead of the total number of students (10) with a valid class code:
Versus using all of the values:
It is allowable to use the DISTINCT in multiple aggregates within a SELECT. The only restriction is that all aggregates must use the same column for each DISTINCT function.
Aggregates and Very Large Data Bases (VLDB)
As great as huge databases might be, there are considerations to take into account when processing large numbers of rows. This section enumerates a few of the situations that might be encountered. Read them and think about the requirement or benefit of incorporating them into your SQL.
Potential of Execution Error
Aggregates use the data type of the column they are aggregating. On most databases, this works fine. However, when working on a VLDB, this may cause the SELECT to fail on a numeric overflow condition. An overflow occurs when the value being calculated exceeds the maximum size or value for the data type being used.
For example, one billion (1,000,000,000) is a valid value for an integer column because it is less than 2,147,483,647. However, if three rows each have one billion as their value and a SUM operation is performed, it fails on the third row.
Try the following series of commands to demonstrate an overflow and its fix:
-- Create a table called Overflow with 2 columns
CT Overflow_tbl (Ovr_byte BYTEINT, Ovr_int INT);
-- Insert 3 rows with very large values of 1 billion where max value is
INS overflow_tbl values (1, 10**9);
INS overflow_tbl values (2, 10**9);
INS overflow_tbl values (3, 10**9);
-- A SUM aggregate on these values will result in 3 billion
SEL SUM(ovr_int) AS sum_col FROM overflow_tbl;
***** 2616 numeric overflow
Attempting this SUM, as written, results in a 2616 numeric overflow error. That is because 3 billion is too large to be stored in the default data type of integer. This is the default because of the data type of the column being used within the aggregate. To fix it, use either of the following techniques to convert the data column to a different type before performing the aggregation.
Whenever you find yourself in a situation where the SQL is failing due to a numeric overflow, it is most likely due to the inherited data type of the column. When this happens, be sure to convert the type before doing the math.
GROUP BY versus DISTINCT
As seen, DISTINCT is used to eliminate duplicate values. In this, the GROUP BY is used to consolidate multiple rows with the same value into the same group. It does the consolidation by eliminating duplicates. On the surface, they provide the same functionality.
The next SELECT uses GROUP BY without aggregation to eliminate duplicates:
The GROUP BY without aggregation returns the same rows as the DISTINCT. So the obvious question becomes, which is more efficient? The answer is not a simple one. Instead, something must be known about the characteristics of the data. Generally, with more duplicate data values – GROUP BY is more efficient. However, if only a few duplicates exist – DISTINCT is more efficient. To understand the reason, it is important to know how each of them eliminates the duplicate values.
Technique used to eliminate duplicates (can be seen in EXPLAIN):
Back to the original question: which is more efficient?
Since DISTINCT redistributes the rows immediately, more data may move between the AMPs, compared to GROUP BY that only sends unique values between the AMPs. So, GROUP BY sounds more efficient. However, when you consider that if the data is nearly unique, GROUP BY spends time attempting to eliminate duplicates that do not exist. Therefore, it is wasting the time to check for duplicates the first time. Then, it must redistribute the same amount of data anyway.
Therefore, for efficiency, when there are:
Teradata Related Interview Questions
|Microstrategy Interview Questions||Informatica Interview Questions|
|MySQL Interview Questions||Oracle 11g Interview Questions|
|Hadoop Interview Questions||TeraData DBA Interview Questions|
|MYSQL DBA Interview Questions||Database Administration Interview Questions|
|DB2 SQL Programming Interview Questions||Hadoop Administration Interview Questions|
|Java Hadoop Developer Interview Questions||Informatica MDM Interview Questions|
|Informatica Admin Interview Questions||Hadoop Testing Interview Questions|
Teradata Related Practice Tests
|Microstrategy Practice Tests||Informatica Practice Tests|
|MySQL Practice Tests||Oracle 11g Practice Tests|
|Hadoop Practice Tests||TeraData DBA Practice Tests|
|MYSQL DBA Practice Tests||Database Administration Practice Tests|
|DB2 SQL Programming Practice Tests||Hadoop Administration Practice Tests|
Teradata Parallel Architecture
Fundamental Sql Using Select
On-line Help And Show Commands
Date And Time Processing
Character String Processing
Reporting Totals And Subtotals
Data Definition Language
All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.