Designing a Database - Firebird

Although relational databases are very flexible, the only way to guarantee data integrity and satisfactory database performance is a solid database design—there is no built-in protection against poor design decisions. A good database design

  • Satisfies the users’ content requirements for the database. Before you can design the database, you must do extensive research on the requirements of the users and how the database will be used. The most flexible database designs today evolve during a well-managed process of analysis, prototyping, and testing that involves all of the people who will use it.
  • Ensures the consistency and integrity of the data. When you design a table, you define certain attributes and constraints that restrict what a user or an application can enter into the table and its columns. By validating the data before it is stored in the table, the database enforces the rules of the data model and preserves data integrity.
  • Provides a natural, easy-to-understand structuring of information. Good design makes queries easier to understand, so users are less likely to introduce inconsistencies into the data or to be forced to enter redundant data.
  • Satisfies the users’ performance requirements. Good database design ensures better performance. If tables are allowed to be too large (wide), or if there are too many (or too few) indexes, long waits can result. If the database is very large with a high volume of transactions, performance problems resulting from poor design are magnified.
  • Insulates the system from design mistakes in subsequent development cycles.

Description and Analysis

A database abstractly represents a world of organization, relationships, rules, and processes. Before reaching the point of being capable of designing the structures and rules for the database, the analyst/designer has much to do, working with the people involved to identify the real-life structures, rules, and requirements from which the database design will be rendered. The importance of scrupulous description and analysis cannot be emphasized too strongly.

Logical data analysis is an iterative process of refining and distilling the world of inputs, tasks, and outputs whose scope is to be encompassed by the database. Large, haphazard structures of information are reduced progressively to smaller, more specialized data objects and are gradually mapped to a data model.

An important part of this reduction process involves normalization—splitting out groups of data items with the goal of establishing essential relationships, eliminating redundancies, and associating connected items of data in structures that can be manipulated efficiently.

This phase can be one of the most challenging tasks for the database designer, especially in environments where the business has been attuned to operating with spreadsheets and desktop databases. Regrettably, even in established client/server environments, too many poorly performing, corruption-prone databases are found to have been “designed” using reports and spreadsheets as the basis.

Data Model <> Database

The “world” that evolves during description and analysis provides a logical blueprint for your data structures. It is a given that the logical model should discover every relationship and set. It is usually a mistake —and a trap inherent in many CASE tools —to translate the data model blindly into a database schema. In sophisticated data management systems like Firebird, a table structure does not always represent the optimal object from which data should be retrieved. Queries, views, arrays, calculated columns, and “selectable” stored procedures are just a few of the retrieval and storage mechanisms available that will influence how you implement the model in the physical design.

Even an excellent data model will lack flexibility and will underperform if it does not take into account the power and economy of the server’s dynamic capabilities. Dynamic structures for the selection and manipulation of data are the arteries of a client/server database.

One Database or Many?

A single Firebird server—with the exception of the local Embedded Server on Windows—can control multiple databases within its own physical filesystem. It is not unusual in large enterprises to run multiple databases to serve separated divisional subsystems. Because one database is not aware of the objects and dependencies in another, it takes careful design, planning, and balancing of system resources and network services to integrate these independent systems. Typically, such databases are synchronized periodically by a replication system.

When designing, bear in mind that Firebird does not support queries that join or union tables across database boundaries. However, it does support simultaneous queries across multiple databases within one single transaction, with two -phase commit. It is thus possible for applications to accomplish tasks that work with data images from two or more databases and perform DML on one database using data read from another.

The Physical Objects

Tables

A database table is usually visualized as a two-dimensional block consisting of columns (the vertical dimension) and rows (the horizontal dimension). The storage attributes of individual items of data are specified in columns (usually related to or dependent upon one another in some way) and rows. A table can have any number of rows (up to a limit of 232) or even no rows at all. Although every row in one table shares the specification of its columns with every other row, rows do not depend on other rows in the same table.

Files and Pages

If you are moving to Firebird from a database system that implements tables by physically tabulating columns and rows in the filesystem, Firebird may bring some surprises. In Firebird, all data belonging to a single database is stored in one file or a set of linked files. In multi-file databases, there is no correlation between any specific database object and a particular member of the database file-set.

Within the boundaries of the file, the Firebird server engine manages evenly sized blocks of disk known as database pages. It manages several different page “types,” according to the type of data it needs to store—regular columns for tables, BLOBs, and indexes, for example. The engine allocates fresh blocks to itself from the host filesystem as required. All pages are the same size, regardless of type. Page size must be specified in the CREATE DATABASE statement. It cannot be altered except by backing up the database and reconstructing it with a new page size, using the gbak utility.

Unlike the file-based data management systems, Firebird does not maintain table data in a tabulated format at all. Rows from one table may not be stored contiguously with other rows from the same table. Indeed, row data for a single table may be distributed among several files and several disks. The engine uses various types of inventory pages to store information about the physical locations of rows belonging to each table.

Columns and Fields

Abstractly, a column is a constellation of attributes, defining the data item that can be stored in one specific cell location in the left -to -right structure of a table’s row.

However, columns don’t just exist in tables in the database. Each time a query is submitted to the database engine for processing, that query specifies a set of columns and one or more operations to be carried out on those columns. The columns do not have to be in the same left -to -right order as is defined for them in the table. For example, the statement

SELECT FIELD3, FIELD1, FIELD2 FROM ATABLE;

will output a set in the column order specified in the query. The query may specify columns from multiple tables, through joins, subqueries, and unions. It may define columns that do not exist in the database at all, by computing them or even just by specifying them as named constants.

Some people use the term “field” when referring to a column, for example, “I have a table TABLE1 that has three fields.” Relational database textbooks often discourage the use of “field” as a substitute for “column,” with some preferring to use “field” to mean “the value in the column” or “the reference to a column.”

In this guide, “field” is used only as a term to generalize the concepts of column, argument, and local variable, and to refer to output items that are constructed at runtime. “Column” is used to refer to the physical columns defined for tables.

Keys

The Primary Key

An essential part of the database design process is to abstract the logical model of the database to the point where, for each table, there is a single, unique column or composite column structure that distinguishes each row from every other row in the table. This unique column or column combination is the logical primary key. When you implement your physical model, you use the PRIMARY KEY constraint to tell the DBMS which column or columns form this unique identifying structure. You may define only one PRIMARY KEY constraint per table.

Other Unique Keys

It can happen in your data modeling process that, for various reasons, you end up with more than one unique column or structure in a table. For enforcing the required uniqueness on such columns or structures, Firebird provides the optional UNIQUE key constraint. It is effectively an alternative primary key and can be used in lieu of the primary key at times, if required.

Foreign Keys

The “cables” that make a relational database “relational” are foreign keys. This is the column or column structure that shows up in your data model on the “many” side of a one-to-many relationship. In the physical design, it matches up with the column or column structure of the primary key of the table on the “one” side of the relationship. For example, in the following simple model, the detail lines of an order are linked to the order header by the ORDER NUMBER key.

Simple relational link

Simple relational link

This model requires that each header row have a unique ORDER _NUMBER and that at least one order detail row exists for each order header row. Other rules may apply to the existence and linking. Firebird provides powerful trigger procedure capabilities for setting, conditioning, and applying rules to relationships. Additionally, it can automate many of the typical rules governing relationships, using the FOREIGN KEY constraint with its optional action arguments. Underlying this constraint are system-generated referential integrity triggers.

Surrogate Keys

The column that your analysis determines to be the primary key, or an element of the primary key, almost always stores a data item that has some meaning. Take, for example, a table storing personal details:

CREATE TABLE PERSON ( FIRST_NAME VARCHAR(30) NOT NULL, LAST_NAME VARCHAR(50) NOT NULL, PHONE_NUMBER VARCHAR(18) NOT NULL, ADDRESS_1 VARCHAR(50), ...);

The designer decides that the combination (FIRST_NAME, LAST_NAME, PHONE_NUMBER) is a good candidate for the primary key. People do share phone numbers, but it is extremely unlikely that two people with identical first and last names would share the same number, right? So, the designer does this:

ALTER TABLE PERSON ADD CONSTRAINT PK_PERSON(LAST_NAME, FIRST_NAME, PHONE_NUMBER);

The first problem with this primary key is that every element has meaning. Every element is maintained by humans and may change or be misspelled. The two keys ('Smith', 'Mary','43889474') and ('SMITH','Mary', '43889474') are not the same and will both be capable of being stored in this table. Which record gets changed if Mary gets married or changes her phone number?

The second problem is that this complex key has to be propagated, as a foreign key, to any tables that are dependent on PERSON. Not only is the integrity of the relationship at risk through alterations or errors in the data, but also it is a broad channel—potentially 98 characters—across which to implement the foreign key relationship.

The real showstopper may occur if these columns use multi-byte character sets or non-binary collations. Index widths are limited to 253 bytes. Mandatory indexes are created to enforce keys. Such a key will be impossible because it is simply too wide.

Making Keys Atomic

An important tenet of good relational database design is atomicity. In the context of primary and foreign keys, atomicity means that no key should have any meaning as data; it should have no other role or function except to be a key.

The solution is to add an extra column to tables to accommodate an artificial or surrogate primary key: a unique, narrow column, preferably system-generated, that replaces (surrogates) the function of the theoretical primary key. Firebird provides GENERATOR objects, which can be implemented to maintain the required unique series of BIGINT numbers, a primary key of a mere 8 bytes or less.

Summary of Surrogate Keys vs. Natural Keys

Database developers tend to take strong positions in the arguments for and against using artificial keys. The author’s position in favor of atomicity is probably evident. However, in the interest of fairness, the arguments for and against are summarized in Table.

Surrogate (Artificial) Keys vs. Natural Keys

Surrogate (Artificial) Keys vs. Natural Keys

Should you design databases with a mix of natural and artificial keys? The extreme view is to advise a consistent design approach—choose natural or artificial and apply the rule without exception. Yet a more moderate approach may offer the best of both worlds. It may be realistic to use a natural key for stable lookup or “control” tables that rarely change, are never required to participate in compound key structures, and appear often in output.

When designing keys for a Firebird database, be mindful that keys are enforced by indexes, and indexes in Firebird have a size limit of 253 bytes. Compounding, collation sequences and multi-byte international character sets reduce the number of characters of actual data that can be accommodated in an index.

Keys Are Not Indexes

Indexes are not keys. Keys are table-level constraints. The database engine responds to constraint declarations by creating a number of metadata objects for enforcing them. For primary keys and unique constraints, it creates a unique index on the column(s) assigned to the constraint. For foreign keys, it creates a non-unique index on the assigned columns, stores records for the dependency, and creates triggers to implement the actions.

  • The keys are the constraints.
  • The indexes are required to enforce the constraints.

Referential Integrity

Accidental altering or deletion of rows that have dependencies will corrupt the integrity of your data. Referential integrity, generally, is a qualitative expression that describes the degree to which dependencies in a database are protected from corruption. However, in the context of this guide, it refers to the in-built mechanisms for enforcing foreign key relationships and performing the desired actions when the primary key of a master row is changed or the row is deleted.

Indexes and Query Plans

If foreign keys are the “cables” that make a database relational, then indexes can be seen as the suppliers of “bandwidth.” Good indexing speeds things up; missing or bad indexes will slow down searches, joins, and sorting.

As a relational database management engine, Firebird can link almost any column object to almost any other column object (the exceptions being the various BLOB types, including ARRAYS) by reference to their identifiers. However, as the numbers of rows, linking columns, and tables in a query increase, performance goes down.

When columns that are searched, joined, or sorted are indexed in useful ways, performance in terms of execution time and resource usage can be dramatically improved. It must also be said that poor indexing can hurt performance!

Firebird uses optimization algorithms that are largely cost-based. In preparing a submitted query, the optimizer calculates the relative costs of choosing or ignoring available indexes and returns a query plan to the client, reporting its choices. Although it is possible to design and submit your own plan to the optimizer—an important feature in RDBMS engines that use rule-based optimization—as a general rule the Firebird optimizer knows best. Firebird plans tend to be most useful in detecting and eliminating problem indexes.

Views

Firebird provides the capability to create and store pre-defined query specifications, known as views, which can be treated in most ways just as though they were tables. A view is a class of derived table that stores no data. For many tasks —especially those where access to certain columns in the underlying tables needs to be denied or where a single query specification cannot deliver the required degree of abstraction —views solve difficult problems.

Stored Procedures and Triggers

Stored procedures and triggers are modules of compiled, executable code that are executed on the server. The source code is a set of SQL language extensions known as Procedural SQL, or PSQL.

Stored procedures can be executable or selectable. They can take input arguments and return output sets. Executable procedures execute completely on the server and optionally return a single-row set (a singleton) of constants on completion. Selectable procedures generate multiple-row sets of zero or more rows, which can be used by client applications in most ways like any other output set.

Triggers are a specialized form of PSQL module that can be declared to execute at one or more of six stage/operation phases (before and after inserting, updating, and deleting) during a data manipulation (DML) operation on the table that owns them. Clusters of triggers can be defined for each phase, to execute in a defined sequence. From release 1.5 onward, the behavior for any or all DML operations can be combined, with conditions, in a single “before” or “after” trigger module. Triggers do not accept input arguments nor return output sets.

Stored procedures can call other stored procedures. Triggers can call stored procedures that, in turn, can call other stored procedures. Triggers can be called from neither a client application nor a stored procedure.

PSQL provides mechanisms for exception handling and callback events. Any number of exception messages can be defined as objects in the database using CREATE EXCEPTION statements. Callback events are created inside the PSQL module and applications can set up structures to “listen” for them.

Database Object Naming Conventions and Constraints

The following limitations on naming database objects must be observed:

  • Start each name with an alphabetic character (A–Z or a–z).
  • Restrict object names to 31 characters. Some objects, such as constraint names, are restricted to 27 bytes in length.
  • Allowable characters for database file names—as with all metadata objects in Firebird—include dollar signs ($), underscores (_), 0 to 9, A to Z, and a to z.
  • Observe uniqueness requirements within a database:
  • In all cases, objects of the same type —all tables, for example— must be unique.
  • Column identifiers must be unique within a table. All other object identifiers must be unique within the database.
  • Avoid the use of reserved words, spaces, diacritic characters, and any ASCII characters having codes higher than 127:
  • In dialect 1, they cannot be used at all.
  • In dialect 3, you can delimit “illegal” identifiers using pairs of double-quote symbols. Details follow.

Optional SQL-92 Delimited Identifiers

In dialect 3 databases, Firebird supports the ANSI SQL convention for optionally delimiting identifiers. To use reserved words, diacritic characters, case-sensitive strings, or embedded spaces in an object name, enclose the name in double quotes. It is then a delimited identifier. Delimited identifiers must always be referenced in double quotes.

Names enclosed in double quotes are case sensitive, for example:

SELECT "CodAR" FROM "MyTable"

is different from

SELECT "CODAR" FROM "MYTABLE"

To Quote or Not to Quote

The double-quoting convention for object identifiers was introduced for compliance with standards. To those who have been used to the global case-insensitivity of InterBase in the past, the new “feature” is at best confusing and at worst exasperating.

If you define objects with double quotes, you must use them everywhere and every time with double-quotes and perfect case-matching. Most experienced Firebird developers recommend avoiding them, except in the occasional cases where you are stuck with using an “illegal” identifier. The choice is yours.

The Case-Matching Exception

If you have double-quoted identifiers in all uppercase, you can use them in SQL without the quotes and treat them as case insensitive. The ability to do this comes from the way identifiers are stored in the internal schema tables and the sequence that the engine follows to resolve them during retrieval.

Database File-Naming Conventions

The established convention for naming Firebird database files on any platform is to apply the three-character suffix .fdb to the primary file and to name secondary files .f01, .f02, etc. This is only a convention—a Firebird database file can have any extension or no extension at all.

Because of known problems on XP servers involving the SystemRestore feature actively targeting files with the suffix .gdb, developers are advised to change the traditional InterBase file suffix on databases migrating to Firebird.

The name of the security database—security.fdb in release 1.5 and higher, isc4.gdb in release 1.0.x—must not be changed. Unfortunately, Firebird 1.0.x has no workaround for its required .gdb suffix.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Firebird Topics