Batch Operations - Firebird

It is a very common requirement to perform a task that inserts, updates, or deletes many rows in a single operation. For example, an application reads a data file from a device and massages it into INSERT statements for posting to a database table, or a replication service processes batches of changes between satellite and parent databases. Many similar statements are posted within a single transaction, usually by means of a prepared statement and replaceable runtime parameters.

Batch Inserts

A statement for a batch insert may look something like this:

INSERT INTO DATATABLE(DATA1, DATA2, DATA3, DATA4, DATA5,....more columns) VALUES ('x', 'y', 'z', 99, '2004-12-25', ....more values);

Using parameters, it looks like this:

INSERT INTO DATATABLE(DATA1, DATA2, DATA3, DATA4, DATA5,....more columns) VALUES (?, '?', '?', ?, ?, ....more values);

Often, a stored procedure will be the most elegant way to set up a repeatable batch operation, especially if there is a requirement to transform data en route to the database table or to insert rows into multiple tables.

Preventing Slowdown on Batches

Two behaviors can cause huge batches to get progressively slower during execution.

  • The maintenance of indexes, which adds work to the task and also tends to distort the geometry of the index trees.
  • The accumulation of back versions, the deltas of the old rows that are marked for obsolescence once the update or delete operations are committed. Back versions do not become available for garbage collection until the transaction completes. In transactions involving back versions numbering in the tens to hundreds of thousands (or more), garbage collection may never win its battle to clean out obsolete versions. Under these conditions, only a backup and restore will sanitize the database completely.

Deactivating Indexes

A strategy for mitigating batch slowdown when other users are not accessing the input table is to deactivate the secondary indexes. This is simple to do:

ALTER INDEX index-name INACTIVE;

When the large batch operation is finished, reactivate and rebuild the indexes:

ALTER INDEX index-name ACTIVE;

Partitioning Large Batches

Unfortunately, you cannot disable the indexes that enforce primary and foreign keys without dropping the actual constraints. Often, it will be impracticable to drop constraints because of chained dependencies. For this reason and others, judicious partitioning of the task is recommended to help offset both the deterioration of index balance and the generation of an unnecessarily large number of new database pages.

When posting large numbers of inserts in batch, it is preferable to partition them into groups and commit work about every 5,000–10,000 rows. The actual size of an optimal partition will vary according to the size of the row and the database page size. Optimal groups may be smaller or larger than this range.

When you need to partition huge batches, a stored procedure is often the way to go. You can make use of local counter variables, event messages, and return values to keep completing procedure calls in sync with the calling client.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Firebird Topics