String Essentials - Firebird

The CHARACTER SET attribute of character types is important not only for compatibility with localized application interfaces but also, in some cases, for deciding the size of the column. Certain character sets use multiple bytes—typically 2 or 3 bytes in Firebird—to store a single character. When such character sets are used, the maximum size is reduced by a factor of the byte size.

String Delimiter

Firebird’s string delimiter is ASCII 39, the single-quote or apostrophe character, for example:

Double -quotes are not permitted at all for delimiting strings. You should remember this if you are connecting to Firebird using application code written for InterBase 5 databases, which permitted double-quoted strings. Strings should also be corrected in the source code of stored procedures and triggers in an InterBase 5 database if you plan to recompile them in Firebird.

Concatenation

Firebird uses the SQL standard symbol for concatenating strings: a doublet of ASCII characters, code 124, known as double-pipe (||). It can be used to concatenate string constants, string expressions, and/or column values, for example:

Character items can be concatenated to numbers and number expressions to produce an alphanumeric string result. For example, to concatenate the character '#' to an integer:

Escape Characters

As a rule, Firebird does not support escape characters as a means to include non-printable codes or sequences in character fields. The single exception is the “doubling” of the apostrophe character (ASCII 39) to enable it to be included as a stored character and prevent its being interpreted as the end-delimiter of the string:

It is possible to store non-printable characters in strings. The UDF AsciiChar(asciivalue) in the ib_udf library can be declared to enable these characters or sequences to be passed in strings. The following statement outputs a set of text fields—to an external file, for example—with a carriage return and line feed in the last one:

For the declaration of Ascii_Char(..) and other functions in the ib_udf library, look in the ../UDF subdirectory beneath the root of your Firebird installation for the script named ib_udf.sql

Limitations with Character Types

Multi-Byte Character Set Limitations

It is important to be aware of the impact of multi-byte character sets on the sizes of text items, especially those with variable length. In UNICODE_FSS, for example, even a 256-character column will be relatively large —potentially 770 bytes—both to store and to retrieve. More is said later in this chapter regarding the caution you need to observe when considering text storage for multi-byte character data.

Index Restrictions

When deciding on the length, character set, and collation sequence for a character column, you need to be aware that indexing columns of these types is somewhat limited. Currently (at v.1.5) the total width of any index cannot exceed 253 bytes—note bytes, not characters. Multi-byte and many of the more complex 1-byte character sets use up many more bytes than the simpler character sets. Multi-segment indexes consume extra bytes, as do collation sequences. Do the byte calculations at design time!

Client Memory Consumption

Client programs will allocate memory to store copies of rows that they read from the database. Many interface layers allocate sufficient resources to accommodate the maximum (i.e., defined) size of a fixed- or variable-length column value, even if none of the actual data stored is that large. Buffering large numbers of rows at a time may consume a large amount of memory, and users will complain about slow screen refreshes and lost connections.

Consider, for example, the impact on the workstation if a query returns 1024 rows consisting of just one column declared as VARCHAR(1024). Even with the “leanest” character set, this column would cost at least 1MB of client memory. For a Unicode column, multiply that cost by three.

Fixed-Length Character Data

Firebird’s fixed-length string data types are provided for storing strings whose length is consistently the same or very similar, or where the format or relative positions of characters might convey semantic content. Typical uses are for items such as identification codes, telecom numbers, and character-based numbering systems, and for defining fields to store pre-formatted fixed-length strings for conversion to other data types— Firebird date literals, for example.

Leading spaces characters (ASCII character 32) in fixed-length string input are significant, whereas trailing spaces are not. When storing fixed-length strings, Firebird strips trailing space characters. The strings are retrieved with right-padding out to the full declared length.

Using fixed-length types is not recommended for data that might contain significant trailing space characters or items whose actual lengths are expected to vary widely.

CHAR(n), Alias CHARACTER(n)

CHAR(n), alias CHARACTER(n), is the base fixed-length character type. n represents the exact number of characters stored. It will store strings in any of the supported character sets.

NCHAR(n), Alias NATIONAL CHARACTER(n)

NCHAR(n), alias NATIONAL CHAR(n) is a specialized implementation of CHAR(n) that is pre-defined with ISO8859_1 as its character set attribute. Of course, it is not valid to define a character set attribute for an NCHAR column, although a collation sequence— the sequence in which the sorting of characters is arranged for searches and ordered output—can be defined for a column or domain that uses this type. A detailed section about character sets and collation sequences follows later in this chapter.

Variable-Length Character Data

Firebird’s variable-length string data types are provided for storing strings that may vary in length. The mandatory size argument n limits the number of discrete characters that may be stored in the column to a maximum of n characters. The size of a VARCHAR cannot exceed 32,765 bytes because Firebird adds a 2-byte size element to each VARCHAR item.

Choosing, Storing, and Retrieving Variable-Length Text

A variable-length character type is the workhorse for storing text because the size of the stored structure is the size of the actual data plus 2 bytes. All characters submitted as input to a field of a variable-length type are treated as significant, including both leading and trailing space characters (“blanks”).

Transport Across the Network

Until Firebird 1.5, retrieved variable-length text data items were padded at the server to the full, declared size before being returned to the client. From Firebird 1.5 forward, the data are not padded. At the time of this writing, the unpadded variable-length text output feature had not been backported to Firebird 1.0.x, which may influence your choice of column size and type if you are writing applications for remote clients connecting to a 1.0.x server on a slow network.

Although variable-length types can store strings of almost 32K, in practice it is not recommended to use them for defining data items longer than about 250 bytes, especially if their tables will become large or will be subject to frequent SELECT queries. BLOBs of SUB_TYPE 1 (text) are usually more suitable for storing large string data. Text BLOBs are discussed in detail in the next chapter.

VARCHAR(n), Alias CHARACTER VARYING(n)

VARCHAR(n), alias CHARACTER VARYING(n), is the base variable-length character type. n represents the maximum number of characters that can be stored in the column. It will store strings in any of the supported character sets. If no character set is defined, the character set attribute defaults to the one defined during CREATE DATABASE as DEFAULT CHARACTER SET. If there is no default character set, the column default is CHARACTER SET NONE.

NCHAR VARYING(n), Alias NATIONAL CHAR VARYING(n)

NCHAR VARYING(n), alias NATIONAL CHAR VARYING(n), alias NATIONAL CHARACTER VARYING(n), is a specialized implementation of VARCHAR(n) that is pre-defined with ISO8859_1 as its character set attribute. It is not valid to define a character set attribute for an NVARCHAR column, although a collation sequence —the sequence in which the sorting of characters is arranged for searches and ordered output—can be defined for a column or domain that uses this type.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Firebird Topics