InnoDB Record Structure MySQL

This page contains:

  • A high-altitude "summary" picture of the parts of a MySQL/InnoDB record structure.
  • A description of each part.
  • An example.

After reading this page, you will know how MySQL/InnoDB stores a physical record.

High-Altitude Picture

The chart below shows the three parts of a physical record.

High-Altitude Picture

Legend: The letter 'F' stands for 'Number Of Fields'.

The meaning of the parts is as follows:

  • The FIELD START OFFSETS is a list of numbers containing the information "where a field starts".
  • The EXTRA BYTES is a fixed-size header.
  • The FIELD CONTENTS contains the actual data.

An Important Note About The Word "Origin"

The "Origin" or "Zero Point" of a record is the first byte of the Field Contents --- not the first byte of the Field Start Offsets. If there is a pointer to a record, that pointer is pointing to the Origin. Therefore the first two parts of the record are addressed by subtracting from the pointer, and only the third part is addressed by adding to the pointer.

FIELD START OFFSETS

The Field Start Offsets is a list in which each entry is the position, relative to the Origin, of the start of the next field. The entries are in reverse order, that is, the first field's offset is at the end of the list. An example: suppose there are three columns. The first column's length is 1, the second column's length is 2, and the third column's length is 4. In this case, the offset values are, respectively, 1, 3 (1+2), and 7 (1+2+4). Because values are reversed, a core dump of the Field Start Offsets would look like this: 07,03,01.

There are two complications for special cases:

  • Complication #1: The size of each offset can be either one byte or two bytes. One-byte offsets are only usable if the total record size is less than 127. There is a flag in the "Extra Bytes" part which
  • will tell you whether the size is one byte or two bytes.
  • Complication #2: The most significant bits of an offset may contain flag values. The next two paragraphs explain what the contents are. When The Size Of Each Offset Is One Byte
  • 1 bit = 0 if field is non-NULL, = 1 if field is NULL
  • 7 bits = the actual offset, a number between 0 and 127
  • When The Size Of Each Offset Is Two Bytes
  • 1 bit = 0 if field is non-NULL, = 1 if field is NULL
  • 1 bit = 0 if field is on same page as offset, = 1 if field and offset are on different pages
  • 14 bits = the actual offset, a number between 0 and 16383

It is unlikely that the "field and offset are on different pages" unless the record contains a large BLOB.

EXTRA BYTES

The Extra Bytes are a fixed six-byte header.

EXTRA BYTES

Total size is 48 bits, which is six bytes.

If you're just trying to read the record, the key bit in the Extra Bytes is 1byte_offs_flag — you need to know if 1byte_offs_flag is 1 (i.e.: "short 1-byteoffsets") or 0 (i.e.: "2-byte offsets").

Given a pointer to the Origin, InnoDB finds the start of the record as follows:

  • Let X = n_fields (the number of fields is by definition equal to the number of entries in the Field Start Offsets Table).
  • If 1byte_offs_flag equals 0, then let X = X * 2 because there are two bytes for each entry instead of just one.
  • Let X = X + 6, because the fixed size of Extra Bytes is 6.
  • The start of the record is at (pointer value minus X).

FIELD CONTENTS

The Field Contents part of the record has all the data. Fields are stored in the order they were defined in.

There are no markers between fields, and there is no marker or filler at the end of a record. Here's an example.

  • I made a table with this definition:

FIELD CONTENTS

To understand what follows, you must know that table T has six columns — not three — because InnoDB automatically added three "system columns" at the start for its own housekeeping. It happens that these system columns are the row ID, the transaction ID, and the rollback pointer, but their values don't matter now. Regard them as three black boxes.

  • I put some rows in the table. My last three INSERT statements were:

High-Altitude-Picture

  • I ran Borland's TDUMP to get a hexadecimal dump of the contents of mysqldataibdata1, which (in my case) is the MySQL/InnoDB data file (on Windows).

Here is an extract of the dump:

extract of the dumpHigh-Altitude-Picture

A reformatted version of the dump, showing only the relevant bytes, looks like this (I've put a line break after each field and added labels):

Reformatted Hexadecimal Dump

Reformatted Hexadecimal Dump

You won't need explanation if you followed everything I've said, but I'll add helpful notes for the three trickiest details.

  • Helpful Notes About "Field Start Offsets":
  • Notice that the sizes of the record's fields, in forward order,are:6, 6, 7, 2, 2, 2.Since each offset is for the start of the "next"field,the hexadecimal offsets are 06,0c(6+6),13(6+6+7),15(6+6+7+2),17(6+6+7+2+2),19(6+6+7+2+2+2). Reversing the order,the Field Start Offsets of the first record are: 19,17,15,13,0c,06.

  • Helpful Notes About "Extra Bytes":
  • Look at the Extra Bytes of the first record: 00 00 78 0D 02 BF. The fourth byte is 0D hexadecimal, which is 1101 binary ... the 110 is the last bits of n_fields (110 binary is 6 which is indeed the number of fields in the record) and the final 1 bit is 1byte_offs_flag. The fifth and sixth bytes, which contain 02 BF, constitute the "next" field. Looking at the original hexadecimal dump, at address 0D42BF (which is position 02BF within the page), you'll see the beginning bytes of System Column #1 of the second row. In other words, the "next" field points to the "Origin" of the following row.

  • Helpful Notes About NULLs:

For the third row, I inserted NULLs in FIELD2 and FIELD3. Therefore in the Field Start Offsets the top bit is on for these fields (the values are 94 hexadecimal, 94 hexadecimal, instead of 14 hexadecimal, 14 hexadecimal). And the row is shorter because the NULLs take no space.

Where to Look For More Information References:

The most relevant InnoDB source-code files are rem0rec.c, rem0rec.ic, and rem0rec.h in the rem ("Record Manager") directory.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

MySQL Topics