# Data Structures

This course contains the basics of Data Structures

Course introduction
Interview Questions
Pragnya Meter Exam

#### Data Structures

Arrays
Array data structure

An array data structure or simply array is a data structure consisting of a collection of elements (values or variables), each identified by one or more integer indices, stored so that the address of each element can be computed from its index tuple by a simple mathematical formula. For example, an array of 10 integer variables, with indices 0 through 9, may be stored as 10 words at memory addresses 2000, 2004, 2008, … 2036 (this memory allocation can vary because some computers use other than 4 bytes to store integer type variables); so that the element with index i has address 2000 + 4 × i[3] Array structures are the computer analog of the mathematical concepts of vector, matrix, and tensor. Indeed, an array with one or two indices is often called a vector or matrix structure, respectively. Arrays are often used to implement tables, especially lookup tables; so the word table is sometimes used as synonym of array.

Arrays are among the oldest and most important data structures, and are used by almost every program and are used to implement many other data structures, such as lists and strings. They effectively exploit the addressing machinery of computers; indeed, in most modern computers (and many external storage devices), the memory is a one-dimensional array of words, whose indices are their addresses. Processors, especially vector processors, are often optimized for array operations.

Arrays are useful mostly because the element indices can be computed at run time. Among other things, this feature allows a single iterative statement to process arbitrarily many elements of an array. For that reason, the elements of an array data structure are required to have the same size and should use the same data representation. The set of valid index tuples and the addresses of the elements (and hence the element addressing formula) are usually fixed while the array is in use.

The terms array and array structure are often used to mean array data type, a kind of data type provided by most high-level programming languages that consists of a collection of values or variables that can be selected by one or more indices computed at run-time. Array types are often implemented by array structures; however, in some languages they may be implemented by hash tables, linked lists, search trees, or other data structures. The terms are also used, especially in the description of algorithms, to mean associative array or "abstract array", a theoretical computer science model (an abstract data type or ADT) intended to capture the essential properties of arrays.

History

Array structures were used in the first digital computers, when programming was still done in machine language, for data tables, vector and matrix computations, and many other purposes. Von Neumann wrote the first array sorting program (merge sort) in 1945, when the first stored-program computer was still being built. Array indexing was originally done by self-modifying code, and later using index registers and indirect addressing. Some mainframes designed in the 1960s, such as the Burroughs B5000 and its successors, had special instructions for array indexing that included index bounds checking.. Assembly languages generally have no special support for arrays, other than what the machine itself provides. The earliest high-level programming languages, including FORTRAN (1957), COBOL (1960), and ALGOL 60 (1960), had support for multi-dimensional arrays.

Applications

Arrays are used to implement mathematical vectors and matrices, as well as other kinds of rectangular tables. Many databases, small and large, consist of (or include) one-dimensional arrays whose elements are records. Arrays are used to implement other data structures, such as heaps, hash tables, deques, queues, stacks, strings, and VLists. One or more large arrays are sometimes used to emulate in -program dynamic memory allocation, particularly memory pool allocation. Historically, this has sometimes been the only way to allocate "dynamic memory" portably.

Arrays can be used to determine partial or complete control flow in programs, as a compact alternative to (otherwise repetitive), multiple IF statements. They are known in this context as control tables and are used in conjunction with a purpose built interpreter whose control flow is altered according to values contained in the array. The array may contain subroutine pointers (or relative subroutine numbers that can be acted upon by SWITCH statements) – that direct the path of the execution.

The number of indices needed to specify an element is called the dimension, dimensionality, or rank of the array. In standard arrays, each index is restricted to a certain range of consecutive integers (or consecutive values of some enumerated type), and the address of an element is computed by a "linear" formula on the indices.

One-dimensional arrays

The one dimensional arrays are also known as Single dimension array and is a type of Linear Array. In the one dimension array the data type is followed by the variable name which is further followed by the single subscript i.e. the array can be represented in the row or column wise. It contains a single subscript and that is why it is known as one dimensional array because one subscript can either represent a row or a column. As an example consider auto int new[10]; In the given example the array starts with auto storage class and is of integer type named new which can contain 10 elements in it i.e. 0-9. It is not necessary to declare the storage class as the compiler initializes auto storage class by default to every data type After that the data type is declared which is followed by the name i.e. new which can contain 10 entities.

For a vector with linear addressing, the element with index i is located at the address B + c · i, where B is a fixed base address and c a fixed constant, sometimes called the address increment or stride. If the valid element indices begin at 0, the constant B is simply the address of the first element of the array. For this reason, the C programming language specifies that array indices always begin at 0; and many programmers will call that element "zeroth" rather than "first". However, one can choose the index of the first element by an appropriate choice of the base address B. For example, if the array has five elements, indexed 1 through 5, and the base address B is replaced by B − 30c, then the indices of those same elements will be 31 to 35. If the numbering does not start at 0, the constant B may not be the address of any element.

Multidimensional arrays

For a two-dimensional array, the element with indices i,j would have address B + c · i + d · j, where the coefficients c and d are the row and column address increments, respectively.More generally, in a k-dimensional array, the address of an element with indices i1, i2, …, ikis B + c1.i1+c2i2+...+ck.ik

This formula requires only k multiplications and k−1 additions, for any array that can fit in memory. Moreover, if any coefficient is a fixed power of 2, the multiplication can be replaced by bit shifting. The coefficients ck must be chosen so that every valid index tuple maps to the address of a distinct element. If the minimum legal value for every index is 0, then B is the address of the element whose indices are all zero. As in the one-dimensional case, the element indices may be changed by changing the base address B. Thus, if a two-dimensional array has rows and columns indexed from 1 to 10 and 1 to 20, respectively, then replacing B by B + c1 - − 3 c1 will cause them to be renumbered from 0 through 9 and 4 through 23, respectively. Taking advantage of this feature, some languages (like FORTRAN 77) specify that array indices begin at 1, as in mathematical tradition; while other languages (like Fortran 90, Pascal and Algol) let the user choose the minimum value for each index.

Dope vectors

The addressing formula is completely defined by the dimension d, the base address B, and the increments c1, c2, … , ck. It is often useful to pack these parameters into a record called the array's descriptor or dope vector. The size of each element, and the minimum and maximum values allowed for each index may also be included in the dope vector. The dope vector is a complete handle for the array, and is a convenient way to pass arrays as arguments to procedures. Many useful array slicing operations (such as selecting a sub array, swapping indices, or reversing the direction of the indices) can be performed very efficiently by manipulating the dope vector.

Compact layouts

Often the coefficients are chosen so that the elements occupy a contiguous area of memory. However, that is not necessary. Even if arrays are always created with contiguous elements, some array slicing operations may create non -contiguous sub -arrays from them. There are two systematic compact layouts for a two -dimensional array. For example, consider the matrix

In the row-major order layout (adopted by C for statically declared arrays), the elements of each row are stored in consecutive positions:

In Column-major order (traditionally used by Fortran), the elements of each column are consecutive in memory:

For arrays with three or more indices, "row major order" puts in consecutive positions any two elements whose index tuples differ only by one in the last index. "Column major order" is analogous with respect to the first index. In systems which use processor cache or virtual memory, scanning an array is much faster if successive elements are stored in consecutive positions in memory, rather than sparsely scattered. Many algorithms that use multidimensional arrays will scan them in a predictable order. A programmer (or a sophisticated compiler) may use this information to choose between row- or column-major layout for each array. For example, when computing the product A·B of two matrices, it would be best to have A stored in row major order, and B in column-major order.

Array resizing

Static arrays have a size that is fixed at allocation time and consequently do not allow elements to be inserted or removed. However, by allocating a new array and copying the contents of the old array to it, it is possible to effectively implement a dynamic or growable version of an array; see dynamic array. If this operation is done infrequently, insertions at the end of the array require only amortized constant time. Some array data structures do not reallocate storage, but do store a count of the number of elements of the array in use, called the count or size. This effectively makes the array a dynamic array with a fixed maximum size or capacity; Pascal strings are examples of this.

Non-linear formulas

More complicated ("non-linear") formulas are occasionally used. For a compact two-dimensional triangular array, for instance, the addressing formula is a polynomial of degree 2.

Efficiency

Both store and select take (deterministic worst case) constant time. Arrays take linear (O(n)) space in the number of elements n that they hold. In an array with element size k and on a machine with a cache line size of B bytes, iterating through an array of n elements requires the minimum of ceiling(nk/B) cache misses, because its elements occupy contiguous memory locations. This is roughly a factor of B/k better than the number of cache misses needed to access n elements at random memory locations. As a consequence, sequential iteration over an array is noticeably faster in practice than iteration over many other data structures, a property called locality of reference (this does not mean however, that using a perfect hash or trivial hash within the same (local) array, will not be even faster - and achievable in constant time).

Libraries provide low-level optimized facilities for copying ranges of memory (such as memcpy) which can be used to move contiguous blocks of array elements significantly faster than can be achieved through individual element access. The speedup of such optimized routines varies by array element size, architecture, and implementation. Memory-wise, arrays are compact data structures with no per-element overhead. There may be a per-array overhead, e.g. to store index bounds, but this is language-dependent. It can also happen that elements stored in an array require less memory than the same elements stored in individual variables, because several array elements can be stored in a single word; such arrays are often called packed arrays.

An extreme (but commonly used) case is the bit array, where every bit represents a single element. A single octet can thus hold up to 256 different combinations of up to 8 different conditions, in the most compact form. Array accesses with statically predictable access patterns are a major source of data parallelism.

Efficiency comparison with other data structures

Growable arrays are similar to arrays but add the ability to insert and delete elements; adding and deleting at the end is particularly efficient. However, they reserve linear (Θ(n)) additional storage, whereas arrays do not reserve additional storage. Associative arrays provide a mechanism for array-like functionality without huge storage overheads when the index values are sparse. For example, an array that contains values only at indexes 1 and 2 billion may benefit from using such a structure. Specialized associative arrays with integer keys include Patricia tries, Judy arrays, and van Emde Boas trees.

Balanced trees require O(log n) time for indexed access, but also permit inserting or deleting elements in O(log n) time,[5] whereas growable arrays require linear (Θ(n)) time to insert or delete elements at an arbitrary position. Linked lists allow constant time removal and insertion in the middle but take linear time for indexed access. Their memory use is typically worse than arrays, but is still linear.

An alternative to a multidimensional array structure is to use a one-dimensional array of references to arrays of one dimension less. For two dimensions, in particular, this alternative structure would be a vector of pointers to vectors, one for each row. Thus an element in row i and column j of an array A would be accessed by double indexing (A[i][j] in typical notation). This alternative structure allows ragged or jagged arrays, where each row may have a different size — or, in general, where the valid range of each index depends on the values of all preceding indices. It also saves one multiplication (by the column address increment) replacing it by a bit shift (to index the vector of row pointers) and one extra memory access (fetching the row address), which may be worthwhile in some architectures.

Meaning of dimension

In computer science, the "dimension" of an array is that its domain, namely the number of indices needed to select an element; whereas in mathematics it usually refers to the dimension of the set of all matrices, that is, the number of elements in the array. Thus, an array with 5 rows and 4 columns (hence 20 elements) is said to be "two-dimensional" in computing contexts, but "20-dimensional" in mathematics.

Row- major order

In computing, row-major order and column-major order describe methods for storing multidimensional arrays in linear memory. Following standard matrix notation, rows are identified by the first index of a two-dimensional array and columns by the second index. Array layout is critical for correctly passing arrays between programs written in different languages. It is also important for performance when traversing an array because accessing array elements that are contiguous in memory is usually faster than accessing elements which are not, due to caching. Row-major order is used in C; column major order is used in Fortran and Matlab.

Row-major order

In row-major storage, a multidimensional array in linear memory is accessed such that rows are stored one after the other. It is the approach used by the C programming language as well as many other languages, with the notable exceptions of Fortran and MATLAB. When using row-major order, the difference between addresses of array cells in increasing rows is larger than addresses of cells in increasing columns. For example, consider this 2×3 array:

Declaring this array in C as

`int A[2][3] = { {1, 2, 3}, {4, 5, 6} };`

would find the array laid-out in linear memory as:

`1 2 3 4 5 6`

The difference in offset from one column to the next is 1 and from one row to the next is 3. The linear offset from the beginning of the array to any given element A[row][column] can then be computed as:

`offset = row*NUMCOLS + column`

Where NUMCOLS is the number of columns in the array. The above formula only works when using the C convention of labeling the first element 0. In other words, row 1, column 2 in matrix A, would be represented as A[0][1]

Note that this technique generalizes, so a 2×2×2 array looks like:

```  int A[2][2][2] = {{{1,2}, {3,4}}, {{5,6}, {7,8}}};
and  the array would be laid-out in linear memory as:
1 2 3 4 5 6 7 8```

Column-major order

Column-major order is a similar method of flattening arrays onto linear memory, but the columns are listed in sequence. The programming languages Fortran, MATLAB,[1] Octave and R[2] use column -major ordering. The array

if stored in linear memory with column-major order would look like the following:

`1 4 2 5 3 6`

With columns listed first. The memory offset could then be computed as:

`offset  = row + column*NUMROWS`

where NUMROWS represents the number of rows in the array—in this case, 2. Treating a row-major array as a column-major array is the same as transposing it. Because performing a transpose requires data movement, and is quite difficult to do in-place for non-square matrices, such transpositions are rarely performed explicitly. For example, software libraries for linear algebra, such as the BLAS, typically provide options to specify that certain matrices are to be interpreted in transposed order to avoid the necessity of data movement.

Generalization to higher dimensions

It is possible to generalize both of these concepts to arrays with greater than two dimensions. For higher-dimensional arrays, the ordering determines which dimensions of the array are more consecutive in memory. Any of the dimensions could be consecutive, just as a two-dimensional array could be listed column -first or row-first. The difference in offset between listings of that dimension would then be determined by a product of other dimensions. It is uncommon, however, to have any variation except ordering dimensions first to last or last to first. These two variations correspond to row-major and column-major, respectively.

More explicitly, consider a d-dimensional array with dimensions Nk(k=1...d). A given element of this array isspecified by a tuple (n1,n2,n3,...nd)of d (zero-based) indices nk∈[0,Nk-1] In row-major order, the memory-offset of this element is given by:

In column-major order, the memory-offset of this element is given by:

Note that the difference between row-major and column-major order is simply that the order of the dimensions is reversed. Equivalently, in row-major order the rightmost indices vary faster as one steps through consecutive memory locations, while in column-major order the leftmost indices vary faster.

Dope vector

In computer programming, a dope vector is a data structure used to hold information about a data object[1] , e.g. an array, especially its memory layout. A dope vector typically contains information about the type of array element, rank of an array, the extents of an array, and the stride of an array as well as a pointer to the block in memory containing the array elements. It is often used in compilers to pass entire arrays between procedures in a high level language like Fortran. Pratt T. and M. Zelkowitz, Programming Languages: Design and Implementation (Third Edition), Prentice Hall, Upper Saddle River, NJ, (1996) pp 114

Iliffe vector

In computer programming, an Iliffe vector (also known as a display) is a data structure used to implement multi-dimensional arrays. Named after John K. Iliffe, an Iliffe vector for an n dimensional array (where n > 2) consists of a vector (or 1 dimensional array) of pointers to an n−1 dimensional array. They are often used to avoid the need for expensive multiplication operations when performing address calculation on an array element. They can also be used to implement triangular arrays, or other kinds of irregularly shaped arrays. Their disadvantages include the need for multiple chained pointer indirections to access an element, and the extra work required to determine the next row in an n-dimensional array to allow an optimising compiler to prefetch it.

Both of these are a source of delays on systems where the CPU is significantly faster than main memory. The Iliffe vector for a 2-dimensional array is simply a vector of pointers to vectors of data, i.e., the Iliffe vector represents the columns of an array where each column element is a pointer to a row vector. Multidimensional arrays in languages such as Java and Atlas Autocode are implemented as Iliffe vectors. Iliffe vectors are contrasted with dope vectors in languages such as Fortran, which contain the stride factors and offset values for the subscripts in each dimension.

Dynamic array

In computer science, a dynamic array, growable array, resizable array, dynamic table, or array list is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard libraries in many modern mainstream programming languages. A dynamic array is not the same thing as a dynamically-allocated array, which is a fixed-size array whose size is fixed when the array is allocated, although a dynamic array may use such a fixed-size array as a back end.

Bounded-size dynamic arrays and capacity

The simplest dynamic array is constructed by allocating a fixed-size array and then dividing it into two parts: the first stores the elements of the dynamic array and the second is reserved, or unused. We can then add or remove elements at the end of the dynamic array in constant time by using the reserved space, until this space is completely consumed. The number of elements used by the dynamic array contents is its logical size or size, while the size of the underlying array is called the dynamic array's capacity, which is the maximum possible logical size. In applications where the logical size is bounded, this data structure suffices. Resizing the underlying array is an expensive operation, typically involving copying the entire contents of the array.

Geometric expansion and amortized cost

To avoid incurring the cost of resizing many times, dynamic arrays resize by a large amount, such as doubling in size, and use the reserved space for future expansion. The operation of adding an element to the end might work as follows:

```  function insertEnd(dynarray a, element e)if (a.size = a.capacity)//  resize a to twice its current capacity:
a.capacity ← a.capacity  * 2//  (copy the contents to the new memory location here)
a[a.size] ← e
a.size ← a.size  + 1```

As n elements are inserted, the capacities form a geometric progression. Expanding the array by any constant proportion ensures that inserting n elements takes O(n) time overall, meaning that each insertion takes amortized constant time. The value of this proportion a leads to a time-space tradeoff: the average time per insertion operation is about a/(a−1), while the number of wasted cells is bounded above by (a−1)n. The choice of a depends on the library or application: a=3/2[1] and a=2 are commonly-used. Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity. Dynamic arrays are a common example when teaching amortized analysis.

Performance

The dynamic array has performance similar to an array, with the addition of new operations to add and remove elements from the end:

• Getting or setting the value at a particular index (constant time)
• Iterating over the elements in order (linear time, good cache performance)
• Inserting or deleting an element in the middle of the array (linear time)
• Inserting or deleting an element at the end of the array (constant amortized time)

Dynamic arrays benefit from many of the advantages of arrays, including good locality of reference and data cache utilization, compactness (low memory use), and random access. They usually have only a small fixed additional overhead for storing information about the size and capacity. This makes dynamic arrays an attractive tool for building cache-friendly data structures.

Compared to linked lists, dynamic arrays have faster indexing (constant time versus linear time) and typically faster iteration due to improved locality of reference; however, dynamic arrays require linear time to insert or delete at an arbitrary location, since all following elements must be moved, while linked lists can do this in constant time. This disadvantage is mitigated by the gap buffer and tiered vector variants discussed under Variants below. Also, in a highly-fragmented memory region, it may be expensive or impossible to find contiguous space for a large dynamic array, whereas linked lists do not require the whole data structure to be stored contiguously.

Variants

Gap buffers are similar to dynamic arrays but allow efficient insertion and deletion operations clustered near the same arbitrary location. Some deque implementations use array deques, which allow amortized constant time insertion/removal at both ends, instead of just one end. Goodrichpresented a dynamic array algorithm called Tiered Vectors that provided O(n1/2) performance for order preserving insertions or deletions from the middle of the array. Hashed Array Tree (HAT) is a dynamic array algorithm invented by Sitarski in 1996. Hashed Array Tree wastes order n1/2 amount of storage space, where n is the number of elements in the array. The algorithm has O(1) amortized performance when appending a series of objects to the end of a Hashed Array Tree.

In a 1999 paper, Brodnik et al. describe a tiered dynamic array data structure, which wastes only n1/2 space for n elements at any point in time, and they prove a lower bound showing that any dynamic array must waste this muchspace if the operations are to remain amortized constant time. Additionally, they present a variant where growing andshrinking the buffer has not only amortized but worst-case constant time.Bagwell (2002)[5] presented the VList algorithm, which can be adapted to implement a dynamic array.

Language support

C++'s std::vector is an implementation of dynamic arrays, as are the ArrayList[6] classes supplied with the Java API and the .NET Framework. The generic List<> class supplied with version 2.0 of the .NET Framework is also implemented with dynamic arrays. Delphi and D implement dynamic arrays at the language's core. Many scripting languages such as Perl and PHP offer dynamic arrays as a built-in primitive data type.

Hashed array tree

In computer science, a hashed array tree (HAT) is a dynamic array algorithm invented by Edward Sitarski in 1996. Whereas simple dynamic array data structures based on geometric expansion waste linear (Ω(n)) space, where n is the number of elements in the array, hashed array trees waste only order n1/2 storage space. It can perform indexing in constant (O(1)) time, but indexing is not as quick in practice as for simple dynamic arrays. The algorithm has O(1) amortized performance when appending a series of objects to the end of a hashed array tree. Contrary to its name, it does not use hash functions.

A full Hashed Array Tree with 16 elements

Definitions

As defined by Sitarski, a hashed array tree has a top-level directory containing a power of two number of leaf arrays. All leaf arrays are the same size as the top-level directory. This structure superficially resembles a hash table with array-based collision chains, which is the basis for the name hashed array

tree. A full hashed array tree can hold m2 elements, where m is the size of the top-level directory.

Expansions and size reductions

When a hashed array tree is full, its directory and leaves must be restructured to twice its prior size to accommodate additional append operations. There are multiple alternatives for reducing size: when a Hashed Array Tree is one eighth full, it can be restructured to a smaller, half-full hashed array tree; another option is only freeing unused leaf arrays.

Related data structures

Brodnik et al. presented a dynamic array algorithm with a similar space wastage profile to hashed array trees. Brodnik's implementation retains previously allocated leaf arrays, with a more complicated address calculation function as compared to hashed array trees.

Gap buffer

In computer science, a gap buffer is a dynamic array that allows efficient insertion and deletion operations clustered near the same location. Gap buffers are especially common in text editors, where most changes to the text occur at or near the current location of the cursor. The text is stored in a large buffer in two contiguous segments, with a gap between them for inserting new text. Moving the cursor involves copying text from one side of the gap to the other (sometimes copying is delayed until the next operation that changes the text). Insertion adds new text at the end of the first segment. Deletion increases the size of the gap. The advantage of using a gap buffer over more sophisticated data structures (such as linked lists) is that the text is represented simply as two literal strings, which take very little extra space and which can be searched and displayed very quickly.

The disadvantage is that operations at different locations in the text and ones that fill the gap (requiring a new gap to be created) require re-copying most of the text, which is especially inefficient for large files. The use of gap buffers is based on the assumption that such recopying occurs rarely enough that its cost can be amortized over the more common cheap operations. A gap buffer is used in most Emacs editors.

Example

Below are some examples of operations with buffer gaps. The gap is represented pictorially by the empty space between the square brackets. This representation is a bit misleading: in a typical implementation, the endpoints of the gap are trackedusing pointers or array indices, and the contents of the gap are ignored; this allows, for example, deletions to be done by adjusting a pointer without changing the text in the buffer. It is a common programming practice to use a semi-open interval for the gap pointers, i.e. the start-of-gap points to the invalid character following the last character in the first buffer, and the end -of-gap points to the first valid character in the second buffer (or equivalently, the pointers are considered to point "between" characters).

```  Initial state:
This is the way [ ]out.
User inserts some new text:
This is the way the world started [ ]out.```

User moves the cursor before "started"; system moves "started " from the first buffer to the second buffer. This is the way the world [ ]started out.

```  User adds text filling the gap; system creates new gap:
This is the way the world as we know it [ ]started
out.```

Circular buffer

A circular buffer, cyclic buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering data streams.

.

Uses

An example that could possibly use an overwriting circular buffer is with multimedia. If the buffer is used as the bounded buffer in the producer-consumer problem then it is probably desired for the producer (e.g., an audio generator) to overwrite old data if the consumer (e.g., the sound card) is unable to momentarily keep up. Another example is the digital waveguide synthesis method which uses circular buffers to efficiently simulate the sound of vibrating strings or wind instruments. The "prized" attribute of a circular buffer is that it does not need to have its elements shuffled around when one is consumed. (If a non-circular buffer were used then it would be necessary to shift all elements when one is consumed.) In other words, the circular buffer is well suited as a FIFO buffer while a standard, non-circular buffer is well suited as a LIFO buffer.

How it works

A circular buffer first starts empty and of some predefined length. For example, this is a 7-element buffer:

Assume that a 1 is written into the middle of the buffer (exact starting location does not matter in a circular buffer):

Then assume that two more elements are added — 2 & 3 — which get appended after the 1:

If two elements are then removed from the buffer then they come from the end. The two elements removed, in this case, are 1 & 2 leaving the buffer with just a 3:

If the buffer has 7 elements then it is completely full:

A consequence of the circular buffer is that when it is full and a subsequent write is performed, then it starts overwriting the oldest data. In this case, two more elements — A & B — are added and they overwrite the 3 & 4:

Alternatively the routines that manage the buffer could easily not allow data to be overwritten and return an error or raise an exception. Whether or not data is overwritten is up to the semantics of the buffer routines or the application using the circular buffer. Finally, if two elements are now removed then what would be returned is not 3 & 4 but 5 & 6 because A & B overwrote the 3 & the 4 yielding the buffer with:

Circular buffer mechanics

What is not shown in the example above is the mechanics of how the circular buffer is managed.

Start / End Pointers

Generally, a circular buffer requires three pointers:

• one to the actual buffer in memory
• one to point to the start of valid data
• one to point to the end of valid data

Alternatively, a fixed-length buffer with two integers to keep track of indices can be used in languages that do not have pointers. Taking a couple of examples from above. (While there are numerous ways to label the pointers and exact semantics can vary, this is one way to do it.)

This image shows a partially-full buffer:

This image shows a full buffer with two elements having been overwritten:

What to note about the second one is that after each element is overwritten then the start pointer is incremented as well.

Difficulties

Full / Empty Buffer Distinction

A small disadvantage of relying on pointers or relative indices of the start and end of data is, that in the case the buffer is entirely full, both pointers pointing at the same element:

This is exactly the same situation as when the buffer is empty:

To solve this problem there are a number of solutions:

• Always keep one slot open.
• Use a fill count to distinguish the two cases.
• Use read and write counts to get the fill count from.
• Use absolute indices.

Always Keep One Slot Open

This simple solution always keeps one slot unallocated. A full buffer has at most FORMULA slots. If both pointers are pointing at the same location, the buffer is empty. If the end pointer, plus one, equals the start pointer, then the buffer is full.

• Very simple and robust.
• You need only the two pointers.

• You can never use the entire buffer.
• If you cannot read over the buffer border, you get a lot of situations where you can only read one element at once.

An example implementation in C:

```  #include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 25
void buffer_char(char c);
char unbuffer_char(void);//a buffer with BUFFER_SIZE slots
char circular_buffer[BUFFER_SIZE];//integers to index circular_buffer
int start, end;
int main(int argc, char *argv[])
{
char sentence[] = {"The quick brown dog jumped over the lazy
fox."};
int i;//add sentence into the bufferfor (i = 0; i < strlen(sentence); i++) {
buffer_char(sentence[i]);
}//read the contents of the bufferwhile(start != end) {
printf("%c", unbuffer_char());
}
printf("n");return 0;
}
void buffer_char(char c)
{//Use modulo as a trick to wrap around the end of the buffer
back to the beginningif ((end + 1) % BUFFER_SIZE != start) {
circular_buffer[end] = c;
end = (end + 1) % BUFFER_SIZE;
}//otherwise, the buffer is full; don't do anything
}
char unbuffer_char(void)
{
char temp = circular_buffer[start];
start = (start + 1) % BUFFER_SIZE;return(temp);
}```

Use a Fill Count

The second simplest solution is to use a fill count. The fill count is implemented as an additional variable which keeps the number of readable items in the buffer. This variable has to be increased if the write (end) pointer is moved, and to be decreased if the read (start) pointer is moved. In the situation if both pointers pointing at the same location, you consider the fill count to distinguish if the buffer is empty or full.

• Simple.
• Needs only one additional variable.

• You need to keep track of a third variable. This can require complex logic, especially if you are working with different threads. Alternately, you can replace the second pointer with the fill count and generate the second pointer as required by incrementing the first pointer by the fill count, modulo buffer size. The advantages are:
• Simple.

Another solution is to keep counts of the number of items written to and read from the circular buffer. Both counts are stored in unsigned integer variables with numerical limits larger than the number of items that can be stored and are allowed to wrap freely from their limit back to zero. The unsigned difference (write _count - read_count) always yields the number of items placed in the buffer and not yet retrieved. This can indicate that the buffer is empty, partially full, completely full (without waste of a storage location) or in a state of overrun. The advantage is:

• The source and sink of data can implement independent policies for dealing with a full buffer and overrun while adhering to the rule that only the source of data modifies the write count and only the sink of data modifies the read count. This can result in elegant and robust circular buffer implementations even in multi -threaded environments. The disadvantage is:
• You need two additional variables.

Absolute indices

If indices are used instead of pointers, indices can store read/write counts instead of the offset from start of the buffer. This is similar to the above solution, except that there are no separate variables, and relative indices are obtained on the fly by division modulo the buffer's length.

• No extra variables are needed.

• Every access needs an additional modulo operation.
• If counter wrap is possible, complex logic can be needed if the buffer's length is not a divisor of the counter's capacity. On binary computers, both of these disadvantages disappear if the buffer's length is a power of two—at the cost of a constraint on possible buffers lengths.

A little bit more complex are multiple read pointers on the same circular buffer. This is useful if you have n threads, which are reading from the same buffer, but one thread writing to the buffer.

Chunked Buffer

Much more complex are different chunks of data in the same circular buffer. The writer is not only writing elements to the buffer, it also assigns these elements to chunks. The reader should not only be able to read from the buffer, it should also get informed about the chunk borders. Example: The writer is reading data from small files, writing them into the same circular buffer. The reader is reading the data, but needs to know when and which file is starting at a given position.

Optimization

A circular-buffer implementation may be optimized by mapping the underlying buffer to two contiguous regions of virtual memory. (Naturally, the underlying buffer‘s length must then equal some multiple of the system’s page size.) Reading from and writing to the circular buffer may then be carried out with greater efficiency by means of direct memory access; those accesses which fall beyond the end of the first virtual-memory region will automatically wrap around to the beginning of the underlying buffer. When the read offset is advanced into the second virtual-memory region, both offsets—read and write—are decremented by the length of the underlying buffer.

Exemplary POSIX Implementation

```  #include <sys/mman.h>
#include <stdlib.h>
#include <unistd.h>
#define report_exceptional_condition() abort ()struct ring_buffer
{
unsigned long count_bytes;
unsigned long write_offset_bytes;
};//Warning order should be  at least 12 for Linux
void
ring_buffer_create (struct ring_buffer *buffer, unsigned long order)
{
char path[] = "/dev/shm/ring-buffer-XXXXXX";
int file_descriptor;
int status;
file_descriptor = mkstemp (path);if (file_descriptor < 0)
report_exceptional_condition ();
report_exceptional_condition ();
buffer->count_bytes = 1UL << order;
buffer->write_offset_bytes = 0;
status = ftruncate  (file_descriptor, buffer->count_bytes);if (status)
report_exceptional_condition ();
buffer->address = mmap (NULL, buffer->count_bytes << 1, PROT_NONE,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);if (buffer->address == MAP_FAILED)
report_exceptional_condition ();
report_exceptional_condition ();
report_exceptional_condition ();
status = close (file_descriptor);if (status)
report_exceptional_condition ();
}
void
ring_buffer_free (struct ring_buffer *buffer)
{
int status;
status = munmap (buffer->address, buffer->count_bytes << 1);if (status)
report_exceptional_condition ();
}
void *
{return buffer->address + buffer->write_offset_bytes; /*** void pointerarithmetic is a constraint  violation. ***/
}
void
unsigned long count_bytes)
{
buffer->write_offset_bytes += count_bytes;
}
void *
}
void
unsigned long count_bytes)
{
{
buffer->write_offset_bytes -= buffer->count_bytes;
}
}
unsigned long
ring_buffer_count_bytes (struct ring_buffer *buffer)
}
unsigned long
ring_buffer_count_free_bytes (struct ring_buffer *buffer)
{return buffer->count_bytes - ring_buffer_count_bytes (buffer);
}
void
ring_buffer_clear (struct ring_buffer *buffer)
{
buffer->write_offset_bytes = 0;
}```

Sparse array

In computer science, a sparse array is an array in which most of the elements have the same value (known as the default value—usually 0 or null). A naive implementation of an array may allocate space for the entire array, but in the case where there are few non-default values, this implementation is inefficient. Typically the algorithm used instead of an ordinary array is determined by other known features (or statistical features) of the array, for instance if the sparsity is known in advance, or if the elements are arranged according to some function (e.g. occur in blocks).

As an example, a spreadsheet containing 100×100 mostly empty cells might be more efficiently stored as a linked list rather than an array containing ten thousand array elements. A heap memory allocator inside a program might choose to store regions of blank space inside a linked list rather than storing all of the allocated regions in, say a bit array.

Bit field

A bit field is a common idiom used in computer programming to compactly store a value as a short series of bits. A bit field is most commonly used to represent integral types of known, fixed bit-width. Perhaps the most well known usage of bit-fields is to represent single bit flags, with each flag stored in a separate bit. A bit field is distinguished from a bit array in that the latter is used to store a large set of bits indexed by integers and is often wider than any integral type supported by the language. Bit fields, on the other hand, typically fit within a machine word, and the denotation of bits is independent of their numerical index.

Examples

Example implementation in C:

```  #define PREFERENCE_LIKES_ICE_CREAM (1 << 0) /* 0x01 */
#define PREFERENCE_PLAYS_GOLF (1 << 1) /* 0x02 */
#define PREFERENCE_WATCHES_TV (1 << 2) /* 0x04 */
#define PREFERENCE_READS_BOOKS (1 << 3) /* 0x08 */
unsigned char preference;
void set_preference(unsigned char flag) {
preference |= flag;
}
void reset_preference(unsigned char flag) {
preference &= ~flag;
}
unsigned char get_preference(unsigned char flag) {return (preference & flag) != 0;
}```

Instead of using hardcoded numerical representations for the powers of two (0x08), the use of the bit shift operator (1 << 3) with an incrementing shift operand is recommended for easier readability.

The C Programming Language, describes a method for defining and accessing fields directly. Using this method, bitwise operators are not needed as bit members can be accessed the same as struct members. An example using a struct follows:

```  struct preferences {
unsigned int likes_ice_cream : 1;
unsigned int plays_golf : 1;
unsigned int watches_tv : 1;
};```
```  struct preferences fred;
fred.likes_ice_cream = 1;
fred.plays_golf = 1;
fred.watches_tv = 1;
fred.reads_books = 0;if (fred.likes_ice_cream == 1)/* ... */```

However, bit members in structs have potential practical drawbacks. First, the ordering of bits in memory is cpu dependent and memory padding rules can vary between compilers. In addition, less well optimized compilers sometimes generate poor quality code for reading and writing bit members and there are potentially thread safety issues relating to bit fields because most machines cannot manipulate arbitrary sets of bits in memory, but must instead load and store whole words;[1] e.g. the following would not be thread-safe, in spite of the use of a mutex for each member:

```  struct foo {
int flag : 1;
int counter : 15;
};struct foo my_foo;/* ... *//* in thread 1 */
my_foo.flag = !my_foo.flag;
++my_foo.counter;

The root of the problem is that on most machines it is impossible to load and store flag and counter separately, when both are stored in the same word. In order for this to be thread-safe you should use a single mutex to protect both flag and counter, instead of two.

Homogeneity and mathematical structure

In the examples above, the individual power -of -two values are declared as macros (ending up being the machine word). Since bitfields are by essence destined to be combined with the bitwise OR operator, such code

```  enum preference {
preference_likes_ice_cream = 1<<0,
preference_plays_golf = 1<<1,
preference_likes_watching_tv = 1<<2,
};```

fails the type -safety principle when combining preference _plays _golf| preference _likes_ice_cream, that does not belong to the enumeration.

Quantities defined as the combination of bits are actually elements of the elementary abelian group (Z/2Z)n; and the relation defined as a<=b when a&b=a only creates a partial order (1011b is greater than 1010b but 1011b and 1101b are not comparable).

This remark is of interest when designing variable importance debug channels (from `informative' to `fatal'); the regular integer comparison cannot be used to filter out part of the messages. Nevertheless, a bit field can be safely and elegantly implemented using a bit array where the bit indices for each flag are values of an enumerated type (like the EnumSet class in Java); this avoids the dangers of direct bitwise manipulations.

Bit array

A bit array (also known as a bitmap, a bitset, or a bitstring) is an array data structure that compactly stores individual bits (boolean values). It implements a simple set data structure storing a subset of {1,2,...,n} and is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation.

Basic operations

Although most machines are not able to address individual bits in memory, nor have instructions to manipulate single bits, each bit in a word can be singled out and manipulated using bitwise operations. In particular:

• OR can be used to set a bit to one: 11101010 OR 00000100 = 11101110
• AND can be used to set a bit to zero: 11101010 AND 11111101 = 11101000
• AND together with zero-testing can be used to determine if a bit is set:

```  11101010 AND  00000001 = 00000000 = 0
11101010 AND 00000010 = 00000010 ≠ 0```

• XOR can be used to invert or toggle a bit:

```  11101010 XOR  00000100 = 11101110
11101110 XOR 00000100 = 11101010```

To obtain the bit mask needed for these operations, we can use a bit shift operator to shift the number 1 to the left by the appropriate number of places, as well as bitwise negation if necessary.

We can view a bit array as a subset of {1,2,...,n}, where a 1 bit indicates a number in the set and a 0 bit a number not in the set. This set data structure uses about n/w words of space, where w is the number of bits in each machine word. Whether the least significant bit or the most significant bit indicates the smallest-index number is largely irrelevant, but the former tends to be preferred. Given two bit arrays of the same size representing sets, we can compute their union, intersection, and set-theoretic difference using n/w simple bit operations each (2n/w for difference), as well as the complement of either:

```  for i from 0 to n/w-1
complement_a[i] := not a[i]
union[i] := a[i] or b[i]
intersection[i] := a[i] and b[i]
difference[i] := a[i] and (not b[i])```

If we wish to iterate through the bits of a bit array, we can do this efficiently using a doubly-nested loop that loops through each word, one at a time. Only n/w memory accesses are required:

```  for i from 0 to n/w-1
index := 0 // if needed
word := a[i]for b from 0 to w-1
value := word and 1 ≠ 0
word := word shift right 1//  do something with value
index  := index + 1 // if needed```

Both of these code samples exhibit ideal locality of reference, and so get a large performance boost from a data cache. If a cache line is k words, only about n/wk cache misses will occur.

More complex operations

Population / Hamming weight

If we wish to find the number of 1 bits in a bit array, sometimes called the population function, or Hamming weight, there are efficient branch-free algorithms that can compute the number of bits in a word using a series of simple bit operations. We simply run such an algorithm on each word and keep a running total. Counting zeros is similar. See the Hamming weight article for examples of an efficient implementation.

Sorting

Similarly, sorting a bit array is trivial to do in O(n) time using counting sort — we count the number of ones k, fill the last k/w words with ones, set only the low k mod w bits of the next word, and set the rest to zero.

Inversion

Vertical flipping of a one -bit -per -pixel image, or some FFT algorithms, require to flip the bits of individual words (so b31 b30 ... b0 becomes b0 ... b30 b31). When this operation is not available on the processor, it's still possible to proceed by successive passes, in this example on 32 bits:

```  exchange two 16bit halfwords
exchange bytes by pairs (0xddccbbaa -> 0xccddaabb)
...
swap bits by pairs
swap bits (b31 b30 ... b1 b0 -> b30 b31 ... b0 b1)
The last operation can be written ((x&0x55555555)<<1) | (x&0xaaaaaaaa)>>1)).```

Find first one

Bit arrays are useful in some contexts as priority queues. The goal in such a context is to identify the one bit of smallest index, that is the least significant bit has the highest priority. Some machines (including normal x86 PCs) have a find first one or find first zero operation that does this on a single word. With this, the operation is obvious: find the first nonzero word and run find first one on it, or find first zero on its complement. On machines that do not feature this operation, the operation can be reproduced using sequences of bit operations. On machines that use two's complement arithmetic, which includes all conventional CPUs, the find first one function can be performed quickly by anding a word with its two's complement, that is performing (w AND -w) results in a word with only the rightmost bit set of the bits that were set before the operation. For instance, if the original value were 6 (110), after this operation the result would be 2 (010). Find-first-zero operation, starting from the leftmost bit, is equivalent to computing the base 2 logarithm. This helps computing rapidly such quantities on machines that comprise it in their basic instruction set (ex: clz in MIPS).

Compression

Large bit arrays tend to have long streams of zeroes or ones. This phenomenon wastes storage and processing time. Run-length encoding is commonly used to compress these long streams. However, by compressing bit arrays too aggressively we run the risk of losing the benefits due to bit-level parallelism (vectorization). Thus, instead of compressing bit arrays as streams of bits, we might compress them as streams bytes or words (see Bitmap index (compression)).

Bit arrays, despite their simplicity, have a number of marked advantages over other data structures for the same problems:

• They are extremely compact; few other data structures can store n independent pieces of data in n/w words.
• They allow small arrays of bits to be stored and manipulated in the register set for long periods of time with no memory accesses.
• Because of their ability to exploit bit-level parallelism, limit memory access, and maximally use the data cache, they often outperform many other data structures on practical data sets, even those that are more asymptotically efficient. However, bit arrays aren't the solution to everything. In particular:
• Without compression, they are wasteful set data structures for sparse sets (those with few elements compared to their range) in both time and space. For such applications, compressed bit arrays, Judy arrays, tries, or even Bloom filters should be considered instead.
• Accessing individual elements can be expensive and difficult to express in some languages. If random access is more common than sequential and the array is relatively small, a byte array may be preferable on a machine with byte addressing. A word array, however, is probably not justified due to the huge space overhead and additional cache misses it causes, unless the machine only has word addressing.

Applications

Because of their compactness, bit arrays have a number of applications in areas where space or efficiency is at a premium. Most commonly, they are used to represent a simple group of boolean flags or an ordered sequence of boolean values. We mentioned above that bit arrays are used for priority queues, where the bit at index k is set if and only if k is inthe queue; this data structure is used, for example, by the Linux kernel, and benefits strongly from a find-first-zero operation in hardware. Bit arrays can be used for the allocation of memory pages, inodes, disk sectors, etc. In such cases, the term bitmap may be used.

However, this term is frequently used to refer to raster images, which may use multiple bits per pixel. Another application of bit arrays is the Bloom filter, a probabilistic set data structure that can store large sets in a small space in exchange for a small probability of error. It is also possible to build probabilistic hash tables based on bit arrays that accept either false positives or false negatives. Bit arrays and the operations on them are also important for constructing succinct data structures, which use close to the minimum possible space. In this context, operations like finding the nth 1 bit or counting the number of 1 bits up to a certain position become important.

Bit arrays are also a useful abstraction for examining streams of compressed data, which often contain elements that occupy portions of bytes or are not byte-aligned. For example, the compressed Huffman coding representation of a single 8-bit character can be anywhere from 1 to 255 bits long. In information retrieval, bit arrays are a good representation for the posting lists of very frequent terms. If we compute the gaps between adjacent values in a list of strictly increasing integers and encode them using unary coding, the result is a bit array with a 1 bit in the nth position if and only if n is in the list.

The implied probability of a gap of n is 1/2n. This is also the special case of Golomb coding where the parameter M is 1; this parameter is only normally selected when -log(2-p)/log(1-p) ≤ 1, or roughly the term occurs in at least 38% of documents.

Language support

The C programming language's bitfields, pseudo-objects found in structs with size equal to some number of bits, are in fact small bit arrays; they are limited in that they cannot span words. Although they give a convenient syntax, the bits are still accessed using bitwise operators on most machines, and they can only be defined statically (like C's static arrays, their sizes are fixed at compile-time). It is also a common idiom for C programmers to use words as small bit arrays and access bits of them using bit operators. A widely available header file included in the X11 system, xtrapbits.h, is "a portable way for systems to define bit field manipulation of arrays of bits.".

In C++, although individual bools typically occupy the same space as a byte or an integer, the STL type vector<bool> is a partial specialization in which bits are packed as a space efficiency optimization. Since bytes (and not bits) are the smallest addressable unit in C++, the [] operator does not return a reference to an element, but instead returns a proxy reference. This might seem a minor point, but it means that vector<bool> is not a standard STL container, which is why the use of vector<bool> is generally discouraged. Another unique STL class, bitset, creates a vector of bits fixed at a particular size at compile-time, and in its interface and syntax more resembles the idiomatic use of words as bit sets by C programmers. It also has some additional power, such as the ability to efficiently count the number of bits that are set. The Boost C++ Libraries provide a dynamic _bitset class whose size is specified at run-time.

The D programming language provides bit arrays in both of its competing standard libraries. In Phobos, they are provided in std.bitmanip, and in Tango, they are provided in tango.core.BitArray. As in C++, the [] operator does not return a reference, since individual bits are not directly addressable on most hardware, but instead returns a bool. In Java, the class BitSet creates a bit array that is then manipulated with functions named after bitwise operators familiar to C programmers. Unlike the bitset in C++, the Java BitSet expands dynamically if a bit is set at an index beyond the current size of the bit vector. In addition, there is a class EnumSet, which represents a Set of values of an enumerated type internally as a bit vector, as a safer alternative to bitfields.

The .NET Framework supplies a BitArray collection class. It stores boolean values, supports random access and bitwise operators, can be iterated over, and its Length property can be changed to grow or truncate it. Although Standard ML has no support for bit arrays, Standard ML of New Jersey has an extension, the BitArray structure, in its SML/NJ Library. It is not fixed in size and supports set operations and bit operations, including, unusually, shift operations. Haskell likewise currently lacks standard support for bitwise operations, but both GHC and Hugs provide a Data.Bits module with assorted bitwise functions and operators, including shift and rotate operations. In Perl, strings can be used as expandable bit arrays. They can be manipulated using the usual bitwise operators (~ | & ^)[4], and individual bits can be test and set using the vec [5] function.

Bitboard

A bitboard is a data structure commonly used in computer systems that play board games.

Definition

A bitboard, often used for boardgames such as chess, checkers and othello, is a specialization of the bitset data structure, where each bit represents a game position or state, designed for optimization of speed and/or memory or disk use in mass calculations. Bits in the same bitboard relate to each other in the rules of the game often forming a game position when taken together. Other bitboards are commonly used as masks to transform or answer queries about positions.

The "game" may be any game-like system where information is tightly packed in a structured form with "rules" affecting how the individual units or pieces relate.

Short description

Bitboards are used in many of the world's best chess playing programs. They help the programs analyze chess positions with few CPU instructions and hold a massive number of positions in memory efficiently. Bitboards are interesting because they allow the computer to answer some questions about game state with one logical operation. For example, if a chess program wants to know if the white player has any pawns in the center of the board (center four squares) it can just compare a bitboard for the player's pawns with one for the center of the board using a logical AND operation.

If there are no center pawns then the result will be zero. Query results can also be represented using bitboards. For example, the query "What are the squares between X and Y?" can be represented as a bitboard. These query results are generally pre calculated, so that a program can simply retrieve a query result with one memory load. However, as a result of the massive compression and encoding, bitboard programs are not easy for software developers to either write or debug.

History

The bitboard method for holding a board game appears to have been invented in the mid 1950's, by Arthur Samuel and was used in his checkers program. The method was published in 1959 as "Some Studies in Machine Learning Using the Game of Checkers" in the IBM Journal of Research and Development. For the more complicated game of chess, it appears the method was independently rediscovered later by the Kaissa team in the Soviet Union in the late 1960s, although not publicly documented, and again by the authors of the U.S. Northwestern University program "Chess" in the early 1970s, and documented in 1977 in "Chess Skill in Man and Machine".

Description for all games or applications

A bitboard or bit field is a format that stuffs a whole group of related boolean variables into the same integer, typically representing positions on a board game. Each bit is a position, and when the bit is positive, a property of that position is true. In chess, for example, there would be a bitboard for black knights. There would be 64-bits where each bit represents a chess square. Another bitboard might be a constant representing the center four squares of the board. By comparing the two numbers with a bitwise logical AND instruction, we get a third bitboard which represents the black knights on the center four squares, if any. This format is generally more CPU and memory friendly than others.

Processor use

Pros

The advantage of the bitboard representation is that it takes advantage of the essential logical bitwise operations available on nearly all CPUs that complete in one cycle and are full pipelined and cached etc. Nearly all CPUs have AND, OR, NOR, and XOR. Many CPUs have additional bit instructions, such as finding the "first" bit, that make bitboard operations even more efficient. If they do not have instructions well known algorithms can perform some "magic" transformations that do these quickly. Furthermore, modern CPUs have instruction pipelines that queue instructions for execution. A processor with multiple execution units can perform more than one instruction per cycle if more than one instruction is available in the pipeline.

Branching (the use of conditionals like if) makes it harder for the processor to fill its pipeline(s) because the CPU can't tell what it needs to do in advance. Too much branching makes the pipeline less effective and potentially reduces the number of instructions the processor can execute per cycle. Many bitboard operations require fewer conditionals and therefore increase pipelining and make effective use of multiple execution units on many CPUs. CPUs have a bit width which they are designed toward and can carry out bitwise operations in one cycle in this width. So, on a 64-bit or more CPU, 64-bit operations can occur in one instruction.

There may be support for higher or lower width instructions. Many 32-bit CPUs may have some 64-bit instructions and those may take more than one cycle or otherwise be handicapped compared to their 32-bit instructions. If the bitboard is larger than the width of the instruction set, then a performance hit will be the result. So a program using 64-bit bitboards would run faster on a real 64-bit processor than on a 32-bit processor.

Cons

Some queries are going to take longer than they would with perhaps arrays, but bitboards are generally used in conjunction with array boards in chess programs.

Memory use

Pros

Bitboards are extremely compact. Since only a very small amount of memory is required to represent a position or a mask, more positions can find their way into registers, full speed cache, Level 2 cache, etc. In this way, compactness translates into better performance (on most machines). Also on some machines this might mean that more positions can be stored in main memory before going to disk.

Cons

For some games writing a suitable bitboard engine requires a fair amount of source code that will be longer than the straight forward implementation. For limited devices (like cell phones) with a limited number of registers or processor instruction cache, this can cause a problem. For full sized computers it may cause cache misses between level one and level two cache. This is a potential problem--not a major drawback. Most machines will have enough instruction cache so that this isn't an issue.

Source code

Bitboard source code is very dense and sometimes hard to read. It must be  documented very well.

Chess bitboards

Standard

The first bit usually represents A1 (the lower left square), and the 64th bit represents H8 (the diagonally opposite square). There are twelve types of pieces, and each type gets its own bitboard. Black pawns get a board, white pawns, etc. Together these twelve boards can represent a position. Some trivial information also needs to be tracked elsewhere; the programmer may use boolean variables for whether each side is in check, can castle, etc. Constants are likely available, such as WHITE _SQUARES, BLACK _SQUARES, FILE _A, RANK _4 etc. More interesting ones might include CENTER, CORNERS, CASTLE _SQUARES, etc. Examples of variables would be WHIT E_ATTACKING, ATTACKED _BY _PAWN, WHITE _PASSED _PAWN, etc.

Rotated

"Rotated" bitboards are usually used in programs that use bitboards. Rotated bitboards make certain operations more efficient. While engines are simply referred to as "rotated bitboard engines," this is a misnomer as rotated boards are used in addition to normal boards making these hybrid standard/rotated bitboard engines. These bitboards rotate the bitboard positions by 90 degrees, 45 degrees, and/or 315 degrees. A typical bitboard will have one byte per rank of the chess board. With this bitboard it's easy to determine rook attacks across a rank, using a table indexed by the occupied square and the occupied positions in the rank (because rook attacks stop at the first occupied square). By rotating the bitboard 90 degrees, rook attacks across a file can be examined the same way.

Adding bitboards rotated 45 degrees and 315 degrees produces bitboards in which the diagonals are easy to examine. The queen can be examined by combining rook and bishop attacks. Rotated bitboards appear to have been developed separately and (essentially) simultaneously by the developers of the DarkThought and Crafty programs. The Rotated bitboard is hard to understand if one doesn't have a firm grasp of normal bitboards and why they work. Rotated bitboards should be viewed as a clever but advanced optimization.

Magics

Magic move bitboard generation is a new and fast alternative to rotated move bitboard generators. These are also more versatile than rotated move bitboard generators because the generator can be used independently from any position. The basic idea is that you can use a multiply, right-shift hashing function to index a move database, which can be as small as 1.5K. A speedup is gained because no rotated bitboards need to be updated, and because the lookups are more cache-friendly.

Other bitboards

Many other games besides chess benefit from bitboards.

• In Connect Four, they allow for very efficient testing for four consecutive discs, by just two shift+and operations per direction.
• In the Conway's Game of Life, they are a possible alternative to arrays.
• Othello/Reversi (see the Reversi article).

Parallel array

In computing, a parallel array is a data structure for representing arrays of records. It keeps a separate, homogeneous array for each field of the record, each having the same number of elements. Then, objects located at the same index in each array are implicitly the fields of a single record. Pointers from one object to another are replaced by array indices. This contrasts with the normal approach of storing all fields of each record together in memory. For example, one might declare an array of 100 names, each a string, and 100 ages, each an integer, associating each name with the age that has the same index.

An example in C using parallel arrays:

```  int ages[] = {0, 17, 2, 52, 25};
char *names[] = {"None", "Mike", "Billy", "Tom","Stan"};
int parent[] = {0 /*None*/, 3 /*Tom*/, 1 /*Mike*/, 0 /*None*/, 3/*Tom*/};for(i = 1; i <= 4; i++) {
printf("Name: %s, Age: %d, Parent: %s n",
names[i], ages[i], names[parent[i]]);
}```

Or, in Python:

```  firstName = ['Joe', 'Bob', 'Frank', 'Hans' ]
lastName = ['Smith','Seger','Sinatra','Schultze']
heightInCM = [169, 158, 201, 199 ]for i in xrange(len(firstName)):print "Name: %s %s" % (firstName[i], lastName[i])print "Height in CM: %s" % heightInCM[i]```

Parallel arrays have a number of practical advantages over the normal approach:

• They can be used in languages which support only arrays of primitive types and not of records (or perhaps don't support records at all).
• Parallel arrays are simple to understand and use, and are often used where declaring a record is more trouble than it's worth.
• They can save a substantial amount of space in some cases by avoiding alignment issues. For example, one of the fields of the record can be a single bit, and its array would only need to reserve one bit for each record, whereas in the normal approach many more bits would "pad" the field so that it consumes an entire byte or a word.
• If the number of items is small, array indices can occupy significantly less space than full pointers, particularly on architectures with large words.
• Sequentially examining a single field of each record in the array is very fast on modern machines, since this amounts to a linear traversal of a single array, exhibiting ideal locality of reference and cache behavior. However, parallel arrays also have several strong disadvantages, which serves to explain why they are not generally preferred:
• They have significantly worse locality of reference when visiting the records sequentially and examining multiple fields of each record, which is the norm.
• They obscure the relationship between fields of a single record.
• They have little direct language support (the language and its syntax typically express no relationship between the arrays in the parallel array.)
• They are expensive to grow or shrink, since each of several arrays must be reallocated.

The bad locality of reference is the worst issue. However, a compromise can be made in some cases: if a structure can be divided into groups of fields that are generally accessed together, an array can be constructed for each group, and its elements are records containing only these subsets of the larger structure's fields. This is a valuable way of speeding up access to very large structures with many members, while keeping the portions of the structure tied together. An alternative to tying them together using array indexes is to use references to tie the portions together, but this can be less efficient in time and space. Some compiler optimizations, particularly for vector processors, are able to perform this transformation automatically when arrays of structures are created in the program.

Lookup table

In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant[1], since retrieving a value from memory is often faster than undergoing an 'expensive' computation or input/output operation. The tables may be precalculated and stored in static program storage or calculated (or "pre-fetched") as part of a programs initialization phase (memoization). Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input.

History

Part of a 20th century table of common logarithms in the reference book Abramowitz and Stegun.

Before the advent of computers, printed lookup tables of values were used by people to speed up hand calculations of complex functions, such as in trigonometry, logarithms, and statistical density functions. School children are often taught to memorize "times tables" to avoid calculations of the most commonly used numbers (up to 9 x 9 or 12 x 12). Even as early as 493 A.D., Victorius of Aquitaine wrote a 98-column multiplication table which gave (in Roman numerals) the product of every number from 2 to 50 times and the rows were "a list of numbers starting with one thousand, descending by hundreds to o hundred, then descending by tens to ten, then by ones to one, and then the fractions down to 1/144"

Early in the history of computers, input/output operations were particularly slow - even in comparison to processor speeds of the time. It made sense to reduce expensive read operations by a form of manual caching by creating either static lookup tables (embedded in the program) or dynamic prefetched arrays to contain only the most commonly occurring data items. Despite the introduction of systemwide caching that now automates this process, application level lookup tables can still improve performance for data items that rarely, if ever, change.

Examples

Simple lookup in an array, an associative array or a linked list (unsorted list)

This is known as a linear search or Brute-force search, each element being checked for equality in turn and the associated value, if any, used as a result of the search. This is often the slowest search method unless frequently occurring values occur early in the list. For a one dimensional array or linked list, the lookup is usually to determine whether or not there is a match with an 'input' data value.

• Insertion or deletion of an element at a specific point of a list is a constant time operation. (While one can "delete" an element from an array in constant time by somehow marking its slot as "vacant", an algorithm that iterates over the elements may have to skip a large number of vacant slots).
• arbitrarily many elements may be inserted into a linked list, limited only by the total memory available; while an array will eventually fill up, and then have to be resized — an expensive operation, that may not even be possible
if memory is fragmented. Similarly, an array from which many elements are removed, may have to be resized in order to avoid wasting too much space. On the other hand:
• arrays allow random access, while linked lists allow only sequential access to elements. Singly-linked lists, in fact, can only be traversed in one direction. This makes linked lists unsuitable for applications where it's useful to quickly look up an element by its index , such as heapsort. See also trivial hash function below.
• Sequential access on arrays is also faster than on linked lists on many machines, because they have greater locality of reference and thus benefit more from processor caching.
• linked lists require extra storage needed for references, that often makes them impractical for lists of small data items such as characters or boolean values. It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools. Some hybrid solutions try to combine the advantages of the two representations. Unrolled linked lists store several elements in each list node, increasing cache performance while decreasing memory overhead for references. CDR coding does both these as well, by replacing references with the actual data referenced, which extends off the end of the referencing record.

Binary search in an array or an associative array (sorted list)

This is known as a binary chop search, sometimes referred to as a "Divide and conquer algorithm", each element being found by determining which half of the table a match may be found in and repeating until either success or failure. Only possible if the list is sorted but gives good performance even if the list is lengthy.

Trivial hash function

For a trivial hash function lookup, the unsigned raw data value is used directly as an index to a one dimensional table to extract a result. For small ranges, this can be amongst the fastest lookup, even exceeding binary search speed with zero branches and executing in constant time.

Counting '1' bits in a series of bytes

One discrete problem that is expensive to solve on many computers, is that of counting the number of bits which are set to 1 in a (binary) number, sometimes called the population function. For example, the decimal number "37" is "00100101" in binary, so it contains three bits that are set to binary "1".

A simple example of C code, designed to count the 1 bits in a int, might look like this:

```  int count_ones(unsigned int x) {
int result = 0;while (x != 0)
result++, x = x & (x-1);return result;
}```

This apparently simple algorithm can take potentially hundreds of cycles even on a modern architecture, because it makes many branches in the loop - and branching is slow. This can be ameliorated using loop unrolling and some other more compiler optimizations. There is however a simple and much faster algorithmic solution - using a trivial hash function table lookup. Simply construct a static table, bits_set, with 256 entries giving the number of one bits set in each possible byte value (e.g. 0x00 = 0, 0x01 = 1, 0x02 = 1, and so on). Then use this table to find the number of ones in each byte of the integer using a trivial hash function lookup on each byte in turn, and sum them. This requires no branches, and just four indexed memory accesses, considerably faster than the earlier code.

```  /* (this code assumes that 'int' is 32-bits wide) */
int count_ones(unsigned int x) {return bits_set[ x & 255] + bits_set[(x >> 8) & 255]
+ bits_set[(x >> 16) & 255] + bits_set[(x >> 24) & 255];
}```

The above source can be improved easily, (avoiding OR'ing, and shifting) by 'recasting' 'x' as a 4 byte unsigned char array and, preferably, coded in-line as a single statement instead of being a function. Note that even this simple algorithm can be too slow now, because the original code might run faster from the cache of modern processors, and (large) lookup tables do not fit well in caches and can cause a slower access to memory (in addition, in the above example, it requires computing addresses within a table, to perform the four lookups needed.

L.U.T.s in Image processing

In data analysis applications, such as image processing, a lookup table (LUT) is used to transform the input data into a more desirable output format. For example, a grayscale picture of the planet Saturn will be transformed into a color image to emphasize the differences in its rings. A classic example of reducing run-time computations using lookup tables is to obtain the result of a trigonometry calculation, such as the sine of a value. Calculating trigonometric functions can substantially slow a computing application.

The same application can finish much sooner when it first precalculates the sine o a number of values, for example for each whole number of degrees (The table can be defined as static variables at compile time, reducing repeated run time costs). When the program requires the sine of a value, it can use the lookup table to retrieve the closest sine value from a memory address, and may also take the step of interpolating to the sine of the desired value, instead of calculating by mathematical formula. Lookup tables are thus used by mathematics co-processors in computer systems. An error in a lookup table was responsible for Intel's infamous floating -point divide bug. Functions of a single variable (such as sine and cosine) may be implemented by a simple array.

Functions involving two or more variables require multidimensional array indexing techniques. The latter case may thus employ a two-dimensional array of power[x][y] to replace a function to calculate xy for a limited range of x and y values. Functions that have more than one result may be implemented with lookup tables that are arrays of structures. As mentioned, there are intermediate solutions that use tables in combination with a small amount of computation, often using interpolation. Pre-calculation combined with interpolation can produce higher accuracy for values that fall between two precomputed values. This technique requires slightly more time to be performed but can greatly enhance accuracy in applications that require the higher accuracy. Depending on the values being precomputed, pre-computation with interpolation can also be used to shrink the lookup table size while maintaining accuracy.

In image processing, lookup tables are often called LUTs and give an output value for each of a range of index values. One common LUT, called the colormap or palette, is used to determine the colors and intensity values with which a particular image will be displayed. Windowing in computed tomography refers to a related concept. While often effective, employing a lookup table may nevertheless result in a severe penalty if the computation that the LUT replaces is relatively simple. Memory retrieval time and the complexity of memory requirements can increase application operation time and system complexity relative to what would be required by straight formula computation. The possibility of polluting the cache may also become a problem. Table accesses for large tables will almost certainly cause a cache miss.

This phenomenon is increasingly becoming an issue as processors outpace memory. A similar issue appears in rematerialization, a compiler optimization. In some environments such as the Java programming language, table lookups can be even more expensive due to mandatory bounds -checking involving an additional comparison and branch for each lookup. There are two fundamental limitations on when it is possible to construct a lookup table for a required operation. One is the amount of memory that is available: one cannot construct a lookup table larger than the space available for the table, although it is possible to construct disk-based lookup tables at the expense of lookup time.

The other is the time required to compute the table values in the first instance; although this usually needs to be done only once, if it takes a prohibitively long time, it may make the use of a lookup table an inappropriate solution. As previously stated however, tables can be statically defined in many cases.

Computing sines

Most computers, which only perform basic arithmetic operations, cannot directly calculate the sine of a given value. Instead, they use the CORDIC algorithm or a complex formula such as the following Taylor series to compute the value of sine to a high degree of precision:

However, this can be expensive to compute, especially on slow processors, and there are many applications, particularly in traditional computer graphics, that need to compute many thousands of sine values every second. A common solution is to initially compute the sine of many evenly distributed values, and then to find the sine of x we choose the sine of the value closest to x. This will be close to the correct value because sine is a continuous function with a bounded rate of change. For example:

```  real  array sine_table[-1000..1000]for x from -1000 to 1000
sine_table[x] := sine(pi * x / 1000)function lookup_sine(x)return sine_table[round(1000 * x / pi)]```

Linear interpolation on a portion of the sine function

Unfortunately, the table requires quite a bit of space: if IEEE double-precision floating -point numbers are used, over 16,000 bytes would be required. We can use fewer samples, but then our precision will significantly worsen. One good solution is linear interpolation, which draws a line between the two points in the table on either side of the value and locates the answer on that line. This is still quick to compute, and much more accurate for smooth functions such as the sine function. Here is our example using linear interpolation:

```  function lookup_sine(x)
x1 := floor(x*1000/pi)
y1 := sine_table[x1]
y2 := sine_table[x1+1]return y1 + (y2-y1)*(x*1000/pi-x1)```

When using interpolation, the size of the lookup table can be reduced by using non-uniform sampling, which means that where the function is close to straight, we use few sample points, while where it changes value quickly we use more sample points to keep the approximation close to the real curve. For more information, see interpolation.

Other usage of lookup tables

Caches

Storage caches (including disk caches for files, or processor caches for either for code or data) work also like a lookup table. The table is built with very fast memory instead of being stored on slower external memory, and maintains two pieces of data for a subrange of bits composing an external memory (or disk) address (notably the lowest bits of any possible external address):

• one piece (the tag) contains the value of the remaining bits of the address; if these bits match with those from the memory address to read or write, then the other piece contains the cached value for this address.
• the other piece maintains the data associated to that address.

A single (fast) lookup is performed to read the tag in the lookup table at the index specified by the lowest bits of the desired external storage address, and to determine if the memory address is hit by the cache. When a hit is found, no access to external memory is needed (except for write operations, where the cached value may need to be updated asynchronously to the slower memory after some time, or if the position in the cache must be replaced to cache another address).

Hardware LUTs

In digital logic, an n-bit lookup table can be implemented with a multiplexer whose select lines are the inputs of the LUT and whose outputs are constants. An n-bit LUT can encode any n-input Boolean function by modeling such functions as truth tables. This is an efficient way of encoding Boolean logic functions, and LUTs with 4-6 bits of input are in fact the key component of modern FPGAs.

Top