MyISAM Record Structure MySQL

Introduction

When you say:

CREATE TABLE Table1...MySQL creates files named Table1.MYD (" MySQL Data "),Table1.MYI (" MySQL Index "), and Table1.frm (" Format ").These files will be in the directory: /< datadir > /< database >/

For example,if you use Linux, you might find the files in the /usr/local/var/test directory(assuming your database name is test).if you use Windows,you might find the files in the mysqldatatest directory.

Let's look at the .MYD Data file (MyISAM SQL Data file) more closely. There are three possible formats — fixed, dynamic, and packed. First, let's discuss the fixed format.

  • Page Size

Unlike most DBMSs, MySQL doesn't store on disk using pages. Therefore you will not see filler space between rows.(Reminder: This does not refer to BDB and InnoDB tables, which do use pages).

  • Record Header

The minimal record header is a set of flags:

  • "X bit" = 0 if row is deleted, = 1 if row is not deleted
  • "Null Bits" = 0 if column is not NULL, = 1 if column is NULL
  • "Filler Bits" = 1

The length of the record header is thus:
(1 + number of NULL columns + 7) / 8 bytes

After the header, all columns are stored in the order that they were created, which is the same order that you would get from SHOW COLUMNS.

Here's an example. Suppose you say:

A CHAR(1) column takes precisely one byte (plus one bit of overhead that is assigned to every column — I'll describe the details of column storage later). So the file Table1.MYD looks like this:

Hexadecimal Display of Table1.MYD file

F1 61 62 63 00 F5 64 00 66 00 ... .abc..d e.

Here's how to read this hexadecimal-dump display:

  • The hexadecimal numbers F1 61 62 63 00 F5 64 20 66 00 are byte values and the column on the right is an attempt to show the same bytes in ASCII.
  • The F1 byte means that there are no null fields in the first row.
  • The F5 byte means that the second column of the second row is NULL.

(It's probably easier to understand the flag setting if you restate F5 as 11110101 binary, and (a) notice that the third flag bit from the right is on, and (b) remember that the first flag bit is the X bit.)

There are complications — the record header is more complex if there are variable-length fields — but the simple display shown in the example is exactly what you'd see if you looked at the MySQL Data file with a debugger or a hexadecimal file dumper.

So much for the fixed format. Now, let's discuss the dynamic format.

The dynamic file format is necessary if rows can vary in size. That will be the case if there are BLOB columns, or "true" VARCHAR columns. (Remember that MySQL may treat VARCHAR columns as if they're CHAR columns, in which case the fixed format is used.) A dynamic row has more fields in the header. The important ones are "the actual length", "the unused length", and "the overflow pointer". The actual length is the total number of bytes in all the columns. The unused length is the total number of bytes between one physical record and the next one. The overflow pointer is the location of the rest of the record if there are multiple parts.

For example, here is a dynamic row:

In the example, the actual length and the unused length are short (one byte each) because the table definition says that the columns are short — if the columns were potentially large, then the actual length and the unused length could be two bytes each, three bytes each, and so on. In this case, actual length plus unused length is 10 hexadecimal (sixteen decimal), which is a minimum.

As for the third format — packed — we will only say briefly that:

  • Numeric values are stored in a form that depends on the range (start/end values) for the data type.
  • All columns are packed using either Huffman or enum coding.

For details, see the source files /myisam/mi_statrec.c (for fixed format), /myisam/mi_dynrec.c (for dynamic format), and /myisam/mi_packrec.c (for packed format).

Note: Internally, MySQL uses a format much like the fixed format which it uses for disk storage. The
main differences are:

  1. BLOB values have a length and a memory pointer rather than being stored inline.
  2. "True VARCHAR" (a column storage which will be fully implemented in version 5.0) will have a 16-bit length plus the data.
  3. All integer or floating-point numbers are stored with the low byte first. Point (3) does not apply for ISAM storage or internals.

Physical Attributes of Columns

Next I'll describe the physical attributes of each column in a row. The format depends entirely on the data type and the size of the column, so, for every data type, I'll give a description and an example.

  • The character data types
  • CHAR
  • Storage: fixed-length string with space padding on the right.
  • Example: a CHAR(5) column containing the value 'A' looks like: hexadecimal 41 20 20 20 20 -- (length = 5, value = 'A ')
  • VARCHAR
  • Storage: variable-length string with a preceding length.
  • Example: a VARCHAR(7) column containing 'A' looks like: hexadecimal 01 41 --(length = 1, value = 'A')
  • In MySQL 4.1 the length is always 1 byte. In MySQL 5.0 the length may be either 1 byte(for up to 255)or 2 bytes (for 256 to 65535). Some further random notes about the new format: In old tables (from MySQL 4.1 and earlier), VARCHAR columns have type MYSQL_TYPE_VAR_STRING,which works exactly like a CHAR with the exception that if you do an ALTER TABLE, it's converted to a true VARCHAR (MYSQL_TYPE_VARCHAR).(This means that old tables will work as before for users.) ... Apart from the above case,there are no longer any automatic changes from CHAR to VARCHAR or from VARCHAR to CHAR. MySQL will remember the declared type and stick to it ... VARCHAR is implemented in field.h and field.cc through the new class Field_varstring ... MyISAM implements VARCHAR both for dynamic-length and fixed-length rows (as signaled with the ROW_FORMAT flag) ... VARCHAR now stores trailing spaces.(If they don't fit, that's an error in strict mode.) Trailing spaces are not significant in comparisons ... In table->record, the space is reserved for length (1 or 2 bytes) plus data ... The number of bytes used to store the length is in the field Field_varchar->length_bytes. Note that internally this can be 2 even if Field_varchar->field_length < 256 (for example, for a shortened key to a varchar(256)) ... There is a new macro, HA_VARCHAR_PACKLENGTH(field_length), that can be used on field->length in write_row / read_row to check how many length bytes are used. (In this context we can't have a field_length < 256 with a 2-byte pack length) ... When creating a key for the handler, HA_KEYTYPE_VARTEXT1 and HA_KEYTYPE_BINARY1 are used for a key on a column that has a 1-byte length prefix and HA_KEYTYPE_VARTEXT2 and HA_KEYTYPE_BINARY2 for a column that has a 2-byte length prefix.(In the future we will probably delete HA_KEYTYPE_BINARY#, as this can be instead be done by just using the binary character set with HA_KEYTYPE_VARTEXT#.)... When sending a key to the handler for index_ read()or records_in_range, we always use a 2-byte length for the VARCHAR to make things simpler.(For version 5.1 we intend to change CHARs to also use a 2-byte length for these functions, as this will speed up and simplify the key handling code on the handler side.)... The test case file mysql-test/include/varchar.inc should be included in the code that
    tests the handler. See t/myisam.test for how to use this. You should verify the result against the one in mysql-test/t/myisam.result to ensure that you get the correct results ... A client sees both the old and new VARCHAR type as MYSQL_TYPE_VAR_STRING. It will never(at least for 5.0) see MYSQL_TYPE_VARCHAR. This ensures that old clients will work as before ... If you run MySQL 5.0 with the --new option, MySQL will show old VARCHAR columns as 'CHAR' in SHOW CREATE TABLE.(This is useful when testing whether a table is using the new VARCHAR type or not.)
  • The numeric data types
  • Important: MySQL almost always stores multi-byte binary numbers with the low byte first. This is called "little-endian" numeric storage; it's normal on Intel x86 machines; MySQL uses it even for non-Intel machines so that databases will be portable.
    TINYINT

  • Storage: fixed-length binary, always one byte.
  • Example: a TINYINT column containing 65 looks like: hexadecimal 41 -- (length = 1, value = 65)
  • SMALLINT
  • Storage: fixed-length binary, always two bytes.
  • Example: a SMALLINT column containing 65 looks like: hexadecimal 41 00 -- (length = 2, value = 65)
  • MEDIUMINT
  • Storage: fixed-length binary, always three bytes.
  • Example: a MEDIUMINT column containing 65 looks like: hexadecimal 41 00 00 -- (length = 3, value = 65)
  • INT
  • Storage: fixed-length binary, always four bytes.
  • Example: an INT column containing 65 looks like: hexadecimal 41 00 00 00 -- (length = 4, value = 65)
  • BIGINT
  • Storage: fixed-length binary, always eight bytes.
  • Example: a BIGINT column containing 65 looks like: hexadecimal 41 00 00 00 00 00 00 00 -- (length = 8, value = 65)
  • FLOAT
  • Storage: fixed-length binary, always four bytes.
  • Example: a FLOAT column containing approximately 65 looks like: hexadecimal 00 00 82 42 -- (length = 4, value = 65)
  • DOUBLE PRECISION
  • Storage: fixed-length binary, always eight bytes.
  • Example: a DOUBLE PRECISION column containing approximately 65 looks like: hexadecimal00 00 00 00 00 40 50 40 -- (length = 8, value = 65)
  • REAL
  • Storage: same as FLOAT, or same as DOUBLE PRECISION, depending on the setting of the --ansi option.
  • DECIMAL
  • MySQL 4.1 Storage: fixed-length string, with a leading byte for the sign, if any.
  • Example: a DECIMAL(2) column containing 65 looks like: hexadecimal 20 36 35 --(length = 3, value = ' 65')
  • Example: a DECIMAL(2) UNSIGNED column containing 65 looks like: hexadecimal 36 35 -- (length = 2, value = '65')
  • Example: a DECIMAL(4,2) UNSIGNED column containing 65 looks like: hexadecimal 36 35 2E 30 30 -- (length = 5, value = '65.00')
  • MySQL 5.0 Storage: high byte first, four-byte chunks. We call the four-byte chunks "*decimal* digits". Since 2**32 = 4294967296, one *decimal* digit can hold values up to 10**9 (999999999), and two *decimal* digits can hold values up to 10**18, and so on. There is an implied decimal point. Details are in /strings/decimal.c.
  • Example: a MySQL 5.0 DECIMAL(21,9) column containing 111222333444.555666777 looks like: hexadecimal 80 6f 0d 40 8a 04 21 1e cd 59 -- (flag + '111','222333444', '555666777').
  • NUMERIC
  • Storage: same as DECIMAL.
  • BOOL
  • Storage: same as TINYINT.
  • The temporal data types
  • DATE
  • Storage: 3 byte integer, low byte first. Packed as: 'day + month*32 + year*16*32'
  • Example: a DATE column containing '1962-01-02' looks like: hexadecimal 22 54 0F
  • DATETIME
  • Storage: eight bytes.
  • Part 1 is a 32-bit integer containing year*10000 + month*100 + day.
  • Part 2 is a 32-bit integer containing hour*10000 + minute*100 + second.
  • Example: a DATETIME column for '0001-01-01 01:01:01' looks like: hexadecimal
    B5 2E 11 5A 02 00 00 00
  • TIME
  • Storage: 3 bytes, low byte first. This is stored as seconds:
    days*24*3600+hours*3600+minutes*60+seconds
  • Example: a TIME column containing '1 02:03:04' (1 day 2 hour 3 minutes and 4 seconds)looks like: hexadecimal 58 6E 01
  • TIMESTAMP
  • Storage: 4 bytes, low byte first. Stored as unix time(), which is seconds since the Epoch
    (00:00:00 UTC, January 1, 1970).
  • Example: a TIMESTAMP column containing '2003-01-01 01:01:01' looks like: hexadecimal 4D AE 12 23
  • YEAR
  • Storage: same as unsigned TINYINT with a base value of 0 = 1901.
  • Others
  • SET
  • Storage: one byte for each eight members in the set.
  • Maximum length: eight bytes (for maximum 64 members).
  • This is a bit list. The least significant bit corresponds to the first listed member of the set.
  • Example: a SET('A','B','C') column containing 'A' looks like: 01 -- (length = 1, value= 'A')
  • ENUM
  • Storage: one byte if less than 256 alternatives, else two bytes.
  • This is an index. The value 1 corresponds to the first listed alternative. (Note: ENUM always reserves the value 0 for an erroneous value. This explains why 'A' is 1 instead of 0.)
  • Example: an ENUM('A','B','C') column containing 'A' looks like: 01 -- (length = 1,value = 'A')
  • The Large-Object data types
  • Warning: Because TINYBLOB's preceding length is one byte long (the size of a TINYINT) and MEDIUMBLOB's preceding length is three bytes long (the size of a MEDIUMINT), it's easy to think there's some sort of correspondence between the BLOB and INT types. There isn't — a BLOB's preceding length is not four bytes long (the size of an INT).
    TINYBLOB

  • Storage: variable-length string with a preceding one-byte length.
  • Example: a TINYBLOB column containing 'A' looks like: hexadecimal 01 41 -- (length
  • = 2, value = 'A') TINYTEXT
  • Storage: same as TINYBLOB.
  • BLOB
  • Storage: variable-length string with a preceding two-byte length.
  • Example: a BLOB column containing 'A' looks like: hexadecimal 01 00 41 -- (length =
    2, value = 'A')
  • TEXT
  • Storage: same as BLOB.
  • MEDIUMBLOB
  • Storage: variable-length string with a preceding length.
  • Example: a MEDIUMBLOB column containing 'A' looks like: hexadecimal 01 00 00 41
  • -- (length = 4, value = 'A')
    MEDIUMTEXT
  • Storage: same as MEDIUMBLOB.
  • LONGBLOB
  • Storage: variable-length string with a preceding four-byte length.
  • Example: a LONGBLOB column containing 'A' looks like: hexadecimal 01 00 00 00
    41 -- (length = 5, value = 'A')
  • LONGTEXT
  • Storage: same as LONGBLOB.

Where to Look For More Information

References:

Most of the formatting work for MyISAM columns is visible in the program /sql/field.cc in thesource code directory. And in the MyISAM directory, the files that do formatting work for different record formats are:



Face Book Twitter Google Plus Instagram Youtube Linkedin Myspace Pinterest Soundcloud Wikipedia

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

MySQL Topics