Sweeping - Firebird

Firebird’s multi-generational architecture creates the situation where multiple versions of data rows are stored directly on the data pages. Firebird keeps the old versions when a row is updated or deleted. In the normal course of events, obsolete record versions created by updates are cleaned up by background garbage collection. However, under some conditions, these old versions can get “stuck” and accumulate, causing the database file(s) to grow out of proportion to the size of accessible data. Sometimes, stuck transactions will impact performance.

Sweeping is a systematic way to remove outdated rows from the database and preventit from growing too large. By default, Firebird databases are always set up to be swept automatically when certain conditions occur. However, because performance can be affected during a sweep, sweeping can be tuned to optimize its benefits while minimizing its impact on users.

It can be a positive tuning strategy to disable automatic sweeping and take charge of it yourself. You can monitor the database statistics and perform manual sweeps, either on an “as required” basis or at scheduled times. You can, for example, include a sweep command in a cron script or scheduled batch file.

For information about how database statistics reports can help in the analysis of the sweeping requirements in your database, refer to the section “Getting Index Statistics,” near the end of Chapter Indexes.

Garbage Collection

Firebird performs garbage collection (GC) in the background to limit the database growth from obsolete record versions. GC frees up space allocated to outdated versions of a row as soon as possible after the row is freed from any transactions that it was involved in. Transactions kick off GC when they encounter back versions of rows discarded by other transactions. Deleted rows and abandoned versions left after rollbacks escape this garbage collection. Rows that are infrequently touched will cause back versions to accumulate, too.

GC also happens whenever the database is backed up using gbak, since gbak’s task touches every row in every table. However, gbak doesn’t perform a full sweep. Like the regular GC, it leaves deleted and rolled-back versions alone. Sweeping is the only way to get rid of these, short of restoring the database from a backup.

Sweep Interval

Sweep interval is an integer setting in the database that specifies the threshold for a particular set of conditions that will trigger off an automatic sweep.

The Firebird server maintains an inventory of transactions. Any transaction that is in any state except committed is known as an interesting transaction. The oldest of these “interesting” transactions (oldest interesting transaction, or OIT) marks the starting point for a condition known as “the gap.”

The opposite end of the gap is the oldest transaction that is still active: the oldest active transaction, or OAT. Thegap is thus the difference between the OIT and the OAT. When the size of the gap reaches the number specified as the sweep interval, anautomatic sweep will occur next time a new transaction starts.

Databases are created with a sweep interval of 20,000.

It is a subtle but important distinction that the automatic sweep does not occur every 20,000 transactions. It occurs when the difference between the OIT and the OAT—the gap—reaches the threshold. If database applications take good care of committing every transaction promptly, the OAT will keep increasing until the gap reaches the threshold of the sweep interval and automatic sweep will be kicked off.

Changing the Sweep Interval

Changing the sweep interval has little effect on database size unless the database has accumulated a lot of rolled -back transactions. However, if you are seeing an increase in transaction startup times as the time since the last sweep increases, then lowering the sweep interval might help to reduce the buildup of rollback artifacts.

If the sweep interval is too low, application performance might tend to go down because of too-frequent sweeping. Raising the sweep interval could help improve overall performance in this case.

The option switch for setting the sweep interval is –h[ousekeeping] n, where n represents the count (interval) that you want to change to.

gfix -h 10000 /data/accounts.fdb -user SYSDBA -pas masterkey

sets the new sweep interval for accounts.fdb to 10000.

Disabling Automatic Sweeping

You might consider disabling the automatic sweep if you need to avoid the occasional, unpredictable delays imposed by automatic sweeps. Disabling it is not recommended unless back-version housekeeping is being managed effectively by alternative means, such as monitoring statistics and running regular manual sweeps.

Automatic sweeping can be disabled by setting a sweep interval of 0.

./gfix -h 0 /data/accounts.fdb -user SYSDBA -pas masterkey

or (Windows)

gfix -h 0 d:\data\accounts.fdb -user SYSDBA -pas masterkey

sets the sweep interval to 0, thus disabling automatic sweeping.

Performing a Manual Sweep

A manual sweep can be done at any time to release space held by back versions, especially record versions left behind by rollbacks and deletions. It is common to schedule sweeps at times of low activity on the database server, to avoid competing with clients for resources.

You may wish to do your own sweeping if

  • You are monitoring the gap and want to choose the appropriate time to housekeep the “sticky” back versions.
  • You think occasional updates of infrequently visited records have built up a backlog of uncollected garbage.
  • A large run of deletions has been done and you want to shift the garbage promptly.

To start an immediate sweep:

gfix -sweep C:\data\accounts.fdb -user SYSDBA -pas masterkey

or (POSIX)

./gfix -sweep /data/accounts.fdb -user SYSDBA -pas masterkey

Exclusive Access for Manual Sweeps

Sweeping a database does not strictly require it to be shut down—it can be done at any time—but it can impact system performance and should be avoided at busy times.

There is a benefit if a sweep is performed with exclusive access and with all clients’ work committed. Under these conditions, not only is more memory available to the sweep operation, but also the sweep is able to do a thorough cleanup of the states of data records and transaction inventory. Unresolved transactions are finally rendered obsolete and the resources being used to track them are freed.

We look next at using gfix to shut down a database and gain exclusive access.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Firebird Topics