Performance Issues With Data Maintenance Teradata

The very mention of changing data on disk implies that space must be managed by the AMP(s) owning the row(s) to modify. Data cannot be changed unless it is read from the disk.

For INSERT operations, a new block might be written or an existing block might be modified to contain the new data row. The choice of which to use depends on whether or not there is sufficient space on the disk to contain the original block plus the number of bytes in the new row.

If the new row causes the block to increase beyond the current number of sectors, the AMP must locate an empty slot with enough contiguous sectors to hold the larger block. Then, it can allocate this new area for the larger block.

A DELETE is going to make one or more blocks shorter. Therefore, it should never have to find a larger slot in which to write the block back to disk. However, it still has to read the existing block, remove the appropriate rows and re-write the smaller block.

The UPDATE is more unpredictable than either the DELETE or the INSERT. This is because an UPDATE might increase the size of the block like the INSERT, decrease the size like the DELETE or not change the size at all.

A larger block might occur because one of the following conditions:

  • A NULL value was compressed and now must be expanded to contain a value. This is the most likely situation.
  • longer character literal is stored into a VARCHAR column.

A smallerblock might occur because one of these conditions:

  • A data value is changed to a NULL value with compression. This is the most likely situation.
  • A smaller character literal is stored into a VARCHAR column.

A block size does not change:

  • The column is a fixed length CHAR, regardless of the length of the actual character data value, the length stays at the maximum defined.
  • All numeric columns are stored in their maximum number of bytes.

There are many reasons for performance gains or losses. Another consideration, which was previously mentioned, is the journal entries for the Transient Journal for recovery and rollback processing. The Transient Journal is mandatory and cannot be disabled. Without it, data integrity cannot be guaranteed.

Impact of FALLBACK on Row Modification

When using FALLBACK on tables, it negatively impacts the processing time when changing rows within a table. This is due to the fact that the same change must also be made on the AMP storing the FALLBACK copy of the row(s) involved. These changes involve additional disk I/O operations and the use of two AMPs instead of one for each row INSERT, UPDATE, or DELETE. That equates to twice as much I/O activity.

Impact of PERMANENT JOURNAL Logging on Row Modification

When using PERMANENT JOURNAL logging on tables, it will negatively impact the processing time when changing rows within a table. This is due to the fact that the UPDATE processing also inserts a copy of the row into the journal table. If BEFORE journals are used, a copy of the row as it existed before a change is placed into the log table. When AFTER images are requested, a copy of the row is inserted into the journal table that looks exactly like the changed row.

There is another issue to consider for journaling, based on SINGLE or DUAL journaling. DUAL asks for a second (mirror) copy to be inserted. It is the journals way to provide FALLBACK copies without the table being required to use FALLBACK. The caution here is that if the TABLE is FALLBACK protected, so are the journals. This will further impact the performance of the row modification.

Impact of Primary Index on Row Modification

In Teradata, all tables must have a Primary Index (PI). It is a normal and very important part of the storage and retrieval of rows for all tables. Therefore, there is no additional overhead processing involved in an INSERT or DELETE operation for Primary Indices.

However, when using an UPDATE and the data value of a PI is changed, there is more processing required than when changing the content of any other column. This is due to the fact that the original row must be read, literally deleted from the current AMP and rehashed, redistributed and inserted on another AMP based on the new data value.

Remember that Primary Keys do not allow changes, but Primary Indexes do. Since the PI may be a column that is not the Primary Key, this rule does not apply. However, be aware that it will take more processing and therefore, more time to successfully complete the operation when a PI is the column being modified.

Impact of Secondary Indices on Row Modification

In Teradata, a Secondary Index is optional. Currently, a table may have 32 secondary indices. Each index may be a combination of up 16 columns within a table. Every unique data value in a defined index has a row in the subtable and potentially one on each AMP for a NUSI (Non Unique Secondary Index). Additionally, every index has its own subtable.

When using secondary indices on tables, it may also negatively impact the processing time when changing rows within a table. This is due to the fact that when a column is part of an index and its data value is changed in the base table, the index value must also be changed in the subtable.

This normally requires that a row be read, deleted and inserted into a subtable when the column is involved in a USI (Unique Secondary Index). Remember that the delete and insert are probably be on different AMP processors.

For a NUSI, the processing all takes place on the same AMP. This is referred to as AMP Local. At first glance this sounds like a good thing. However, the processing requires a read of the old NUSI, a modification, and a rewrite. Then, most likely it will be necessary to insert an index row into the subtable. However, if the NUSI already exists, Teradata needs to read the existing NUSI, append the new data value to it and re-write it back into the subtable. This is why it is important not to create a Primary Index or a Secondary Index on data that often changes.

The point of this discussion is simple. If secondary indices are used, additional processing is involved when the data value of the index is changed. This is true on an INSERT, a DELETE and an UPDATE. So, if a secondary index is defined, make sure that the SQL is using it to receive the potential access speed benefit. An EXPLAIN can provide this information. If it is not being used, drop the index.

As an added note to consider, when using composite secondary indices, the same column can be included in multiple indices. When this is the case, any data value change requires multiple subtables changes. The result is that the number of indices in which it is defined multiplies the previous AMP and subtable-processing overhead. Therefore, it becomes more important to choose columns with a low probability of change.



Face Book Twitter Google Plus Instagram Youtube Linkedin Myspace Pinterest Soundcloud Wikipedia

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Teradata Topics