Transaction “Aging” and Statistics - Firebird

When a client process secures a transaction handle, it acquires a unique internal identifier for the transaction and stores it on a transaction inventory page (TIP).

Transaction ID and Age

Transaction IDs (TIDs) are 32-bit integers that are generated in a single-stepping series. A new or freshly restored database begins the series at 1. Transaction age is determined from the TID: lowest is oldest.

TIDs and their associated state data are stored on TIPs. On the database header page, system accounting maintains a set of fields containing TIDs that are of interest to it, namely the oldest interesting transaction (OIT), the oldest active transaction (OAT), and the number to be used for the next transaction. A “snapshot” TID is also written each time the OAT increases—usually the same TID as the OAT or close to it.

Getting the Transaction ID

Firebird 1.5 and later provide the context variable CURRENT_TRANSACTION that returns the TID of the transaction that requests it. It can be used in SQL wherever it is appropriate to use an expression. For example, to store the TID into a log table, you could use it like this:

INSERT INTO BOOK_OF_LIFE
(TID, COMMENT, SIGNATURE)
VALUES
(CURRENT_TRANSACTION,
'This has been a great day for transactions',
CURRENT_USER);

It is important to remember that TIDs are cyclic. Because the numbering series will be reset after each restore, they should not be used for primary keys or unique constraints.

Transaction ID Overflow

As previously mentioned, the TID is a 32-bit integer. If a series hit its 4GB limit and rolls over, bad things will happen. When the last transaction ends, the system transaction will not work and metadata updates will be impossible. Garbage collection will stop. User transactions will not start.

At 100 transactions per second, it takes 1 year, 4 months, 11 days, 2 hours, and roughly 30 minutes to roll over the TID.

Backing up and restoring into a fresh database resets the TID. Until recently, a neglected database would have had other trauma before the TID series exhausted itself. Now, with larger page sizes, larger disks, and a reduced need to watch the size of data-base files, the risk of blowing a TID series is more apparent.

“Interesting Transactions”

Server and client transaction accounting routines use TIDs to track the states of transactions. “Housekeeping” routines take the age of transactions into account when deciding which old record versions are “interesting” and which are not. Uninteresting transactions can be flagged for removal. The server remains “interested in” every transaction that has not been hard-committed by a COMMIT statement.

Active, limbo, rolled back, and “dead” transactions are all interesting. Transactions that have been committed using COMMIT WITH RETAIN (aka soft commit or CommitRetaining) remain active until they are hard-committed and are thus interesting. Conditions can develop in which interesting transactions become “stuck” and inhibit performance.

If neglected, stuck transactions will become the source of serious performance degradation. A stuck OIT will cause the number of transaction inventory pages to grow. The server maintains a bitmapped working table of the transactions stored in the TIPs. The table is copied and written to the database at the start of each transaction. As it becomes bloated by stuck transactions, it uses progressively more memory resources and memory becomes fragmented from constant reallocation of the resources.

Oldest Interesting Transaction

The OIT is the lowest numbered transaction in the TIPs that is in a non-committed state.

Oldest Active Transaction

The OAT is the lowes numbered transaction in the TIPs that is active. A transaction is active as long as it is not hard-committed, not rolled back, and not in limbo.

Read-Only Transactions

A read-only transaction remains active (and interesting in some ways) until it is committed. However, an active read -only transaction that is in the recommended READ COMMITTED isolation level never gets stuck and will not interfere with system housekeeping.

Background Garbage Collection

Record versions from rolled-back transactions are purged from the database when they are found in the course of normal data processing. As a rule, if a row is accessed by a statement, any uninteresting old record versions that are eligible for removal will be tagged for collection by the garbage collection thread.

Some “corners” of the database might not be visited for a long time by any statements, so there is no guarantee that all of the record versions created by a rolled-back transaction will be tagged and, eventually, removed. As long as the record versions remain, the interesting transaction must remain “interesting” to preserve the state of the database.

Rolled-Back Transactions

Transactions in the rolled-back state are not garbage collected. They remain interesting until a database sweep tags them as “committed” and releases them for garbage collection. In systems with low contention, a periodic manual sweep may be all that is needed to deal with them.

Some systems that are not well designed to deal with conflicts demonstrate high levels of rollback and tend to accumulate interesting transactions faster than the automatic database housekeeping procedures can cope with. Such systems should be subjected to scheduled manual sweeps often if automatic sweeping seems not to manage well.

“Dead” Transactions

Transactions are said to be “dead” if they are in an active state, but there is no connection context associated with them. Transactions typically “die” in client applications that do not take care to end them before disconnecting users. Dead transactions are also part of the flotsam left behind when client applications crash or network connections are broken abnormally.

The server cannot tell the difference between a genuinely active transaction and a dead one. As long as the cleanup following connection timeout is able to proceed and the regular garbage collection kicks in in a timely fashion, dead transactions need not be a problem. The timeout cleanup will roll them back and their garbage will eventually be dealt with normally.

Frequent, large accumulations of dead transactions can be a problem. Because they are rolled back, too many of them remain interesting for too long. In a system where user behavior, frequent crashes, power cuts, or network faults generate huge numbers of dead transactions, dead-transaction garbage will become a big problem for performance.

Limbo Transactions

Limbo transactions, which are discussed in more detail in the next chapter, happen in the course of a failed two-phase COMMIT operation across multiple databases. The system recognizes them as a special case for human intervention, since the server itself cannot determine whether it is safe to either commit them or roll them back without causing inconsistency in a different database.

The only way to resolve a limbo transaction is to run the gfix tool over the database with the appropriate switches to achieve the desired outcome. Resolution changes the state of a limbo transaction to either “committed” or “rolled back.” From that point, it is managed just like any other committed or rolled-back transaction.

Keeping the OIT and OAT Moving

The admonition to “Keep the OIT and OAT moving” is a catchphrase around all support solutions where performance is the problem. Time spent understanding the life cycle of transactions in Firebird’s multi-generational architecture will be one of your best investments for future work with Firebird and other open source databases.

To begin understanding the interaction between the client processes that “own” transactions, on the one hand, and the transaction inventory accounting maintained in the database, on the other, it is useful to be acquainted with the way the server and transactions interact in the mechanism.

The Transaction State Bitmap

The internal transaction state bitmap (TSB) is a table of TIDs and their states, maintained by the server, that is initialized when an initial attachment is made to a database. In logical terms, the TSB tabulates each transaction found in the TIPs that is newer than the OIT. While there are attachments to the database, the server process maintains the TSB dynamically, adding newer TIDs, updating states, and sloughing off the TIDs that become uninteresting (i.e., get committed). It writes updates to the TIPs without reading them again from disk.

Figure illustrates the steps.

Each time the server processes a request for a transaction handle, it reads the database header page to acquire the TID for the next transaction. The TSB is updated and updates are written to the transaction inventory pages in the database.

Interaction of the client/server process and the TSB

Interaction of the client/server process and the TSB

“Moving the OIT (and/or the OAT) forward” is Firebird-speak for the evolving increase in the values of the OIT and OAT in the bitmap and in the database header page as older transactions get committed and are eliminated from the TSB and newer transactions become identified as “oldest.” Updating the database header page with the latest OIT, OAT, and Next Transaction values is part of this dynamic ecology.

If the new transaction is a SNAPSHOT transaction, it will have its own copy of the TSB to maintain a consistent view of database state as it was when it began. A READ COMMITTED transaction always refers to the latest “global” TSB to get access to versions committed after it began.

Conditions for Updating the OIT and OAT

Each time the server starts another transaction, it tests the state of the TIDs it is holding in the TSB, eliminates those whose state has changed to “committed,” and re-evaluates the OIT and OAT values. It compares them with those stored in the data-base page and, if necessary, includes them in the data accompanying the write of the new “Next Transaction” ID it sends to the database header.

The OAT will keep moving forward if transactions are kept short enough to prevent active transactions and garbage from committed newer transactions from piling up. The OIT will keep moving forward as long as client processes are committing substantially more work than they are rolling back or losing as the result of crashes. Under these conditions, database performance will be in good shape.

A stuck OAT is worse than a stuck OIT. In a well-performing system, the difference between the OAT and the newest transaction should be a reasonably steady approximation of the number of client processes running times the average number of transactions running per process. Sweep in off-work hours or keep automatic sweeping in force, or do both.

The “Gap”

The “gap” is another piece of Firebird-speak. It refers to the difference between the OIT and the OAT. The gap will be small in comparison with overall transaction throughput or, ideally, zero. In these conditions, it can be a reasonable assumption that there are no transactions hanging around that are causing the TSB to bloat and the TIPs to proliferate outlandishly.

It is not the gap itself that impedes performance. The gap is an indicator of the volume of overhead that sub -optimal transaction management is adding to database activity—overuse and fragmentation of memory, excessive numbers of page-reads during searches, and new page-allocations during updates and inserts. Solving and avoiding problems of degrading performance is all about controlling and reducing the gap.

Sweeping vs. Garbage Collection

Garbage collection (GC) is a continual background process that is a function of the normal course of record retrieval and record-version checking that is performed for each transaction. When obsolete record versions are found with TIDs lower than the OAT, one of two things will happen:

  • On Classic server, obsolete record versions are removed immediately. This is referred to as cooperative garbage collection, because each transaction and each server instance participates in clearing garbage left behind by others.
  • On Superserver, obsolete record versions are “tagged” in an internal list of items for removal by the GC thread. When a GC thread is “woken up,” it will deal with the items in this list and update it.

Sweeping performs this task too, but, unlike GC, it can deal with one category of interesting transactions: those that are in “rolled-back” state. It can also remove the “stumps” of deleted records and release the space for reuse.

The gap is important for automatic sweeping because the sweep interval setting of the database governs the maximum size the gap is allowed to be, before a sweep is triggered off. Automatic sweeping is a backstop that is rarely or never triggered in some databases because the gap never reaches the threshold number.

By default, each database is created with a sweep interval (maximum gap size) of 20,000. It can be varied up or down, if necessary, or disabled altogether by setting the interval to 0.

It should be stressed that a gap that is consistently smaller than the sweep interval is not an indication that the database never needs sweeping. All databases must be swept—it is a question of human management whether they are swept automatically, or manually with gfix, or both. Think of automatic sweeping as a safety net, not as a substitute for sensible database management.

Transaction Statistics

Firebird has some useful utilities for querying how well your database is managing the gap between the OIT and the OAT. You can use either to inspect the values on the data-base header page.

gstat

The command-line tool gstat, used with the –header switch, reveals a number of database statistics, including the current TIDs of the OIT, the OAT, and the next new transaction. To use gstat, log in as SYSDBA on the host machine in a command shell and go to your Firebird bin directory. Type the following in Windows:

gstat -h <path-to-database> -user sysdba -password masterkey

Type this in POSIX:

./gstat -h <path-to-database> -user sysdba -password masterkey

This is an extract from near the top of the output:

Oldest transaction 10075
Oldest active 100152
Oldest snapshot 100152
Next transaction 100153

The terms in the report can be confusing. When looking at the gstat output, note that

  • The oldest transaction is the OIT.
  • The oldest active is obviously the OAT.
  • The oldest snapshot is usually the same as the OAT—it gets written when the OAT moves forward. It is the actual TID that the garbage collector reads as a signal that there is some garbage that it can handle.

isql

You can get a similar view of the database header statistics in an isql session, using the SHOW DATABASE command.

Many of the third-party Firebird admin tools provide equivalent reports.

What the Statistics Can Tell You

It is a “given” that no GC will ever be performed on obsolete record versions left behind by interesting transactions. However, the oldest snapshot marks the boundary where the garbage collector stops looking for committed transactions. Any garbage from that transaction number upward will be unattended.

If the gap between the OAT and the Next Transaction indicates a much higher number of transactions than you can account for by an enumeration of logged-in users and their tasks, you can be certain that a lot of garbage is missing the garbage collector. As that gap keeps growing, database performance becomes more and more sluggish. Servers have been known to throw an “Out of memory” error or just crash because the TSB exhausted memory or caused too much fragmentation for the system’s memory management services to handle. Low -end servers —especially those being used to deliver other services —may not even have enough resources to record the evidence in the log.

If gaps, either between the OIT and the OAT or between the OAT and the Next Transaction, seem to be a problem in your system, you can learn a lot about the effects of sweeping the database and improving your client applications if you keep logs of the statistics.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Firebird Topics