# Map - Data Structures

An associative array (also associative container, map, mapping, dictionary, finite map, and in query-processing an index or index file) is an abstract data type composed of a collection of unique keys and a collection of values, where each key is associated with one value (or set of values).

The operation of finding the value associated with a key is called a lookup or indexing, and this is the most important operation supported by an associative array. The relationship between a key and its value is sometimes called a mapping or binding. For example, if the value associated with the key "bob" is 7, we say that our array maps "bob" to 7.

Associative arrays are very closely related to the mathematical concept of a function with a finite domain. As a consequence, a common and important use of associative arrays is in memoization. From the perspective of a computer programmer, an associative array can be viewed as a generalization of an array. While a regular array maps an integer key (index) to a value of arbitrary data type, an associative array's keys can also be arbitrarily typed.

In some programming languages, such as Python, the keys of an associative array do not even need to be of the same type. Content-addressable memory (CAM) systems use a special type of computer memory to improve the performance of lookups in associative arrays and are used in specialized applications. Several supercomputers from the 1970simplemented CAM directly in hardware, and were known as associative computers.

Operations

The operations that are usually defined for an associative array are:

• Add: Bind a new key to a new value
• Reassign: Bind an old key to a new value
• Remove: Unbind a key from a value and remove the key from the key set
• Lookup: Find the value (if any) that is bound to a key

Basic abstract arrays

An abstract array structure has two main operations, fetch and store. In the functional specification style, where the entities are array states rather than the array themselves, they are

• fetch(A, I): obtains the data stored in the element of the array whose state is A and whose indices I.
• store(A,I,V): the array state that results by setting the value of that element to V in array state A. These operations are required to satisfy the axioms
• fetch(store(A,I, V), I) = V
• fetch(store(A,I, V), J) = fetch(A, J) if I J

for any array state A, any value V, and any tuples I, J for which the operations are defined. The first axiom means that each element behaves like an ordinary variable. The second axiom means that elements with distinct indices behave as disjoint variables, so that storing a value in one element does not affect the value of any other element.

These axioms do not place any constraints on the set of valid index tuples I, therefore this abstract model can be used for triangular matrices and other oddly-shaped arrays, or for general associative arrays.

Variations of abstract arrays

A resizable, growable, or dynamic array is an array ADT that allows the range of indices to be increased without changing the values of its current elements. For one-dimensional arrays, this facility is often provided as an append(A,x) operation that increases the size of the array A by one and then sets the value of the last element to x. Resizable arrays are one popular way of implementing lists.

Examples

One can think of a telephone book as an example of an associative array, where names are the keys and phone numbers are the values. Using the usual array-like notation, we might write

and so on. These entries can be thought of as two records in a database table:

To retrieve the element from the associative array, we use a similar notation i.e.

where for both examples, x = 01-1234-56 and y = 02-4321-56 after execution. Another example would be a dictionary where words are the keys and definitions are the values.

Since a database equivalent is that of a table containing precisely two fields - key and value - we can use an associative array to store any information which can be held in this form.

Data structures for representing

Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures (such as association lists).

Efficient representations

There are two main efficient data structures used to represent associative arrays, the hash table and the self-balancing binary search tree (such as a red-black tree or an AVL tree). Skip lists are also an alternative, though relatively new and not as widely used. B-trees (and variants) can also be used, and are commonly used when the associative array is too large to reside entirely in memory, for instance in a simple database. Relative advantages and disadvantages include:

• Asymptotic operation performance: Hash tables have faster average lookup and insertion time, O(1), compared to a balanced binary search tree's Θ(log n), while balanced trees have faster worst-case lookup and insertion time, O(log n) as compared to Θ(n). Skip lists have O(n) worst-case and O(log n) average-case operation times, but with less insertion and deletion overhead in practice than balanced binary trees.
• Ordering preservation: Balanced binary trees and skip lists preserve ordering — allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently (they require the data to be sorted in a separate step).
• Range queries: Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range. (With n elements in the array and performing the operation on a contiguous range of m keys, a balanced binary tree will take O(log(n)+m) time while a hash table would need Θ(n) time as it needs to search the entire table.)
• Allocation behavior: Hash tables with open addressing store all data in a large contiguous block of memory that is reallocated infrequently, whereas tree allocations perform many small, frequent allocations. As a result hash tables may be difficult to allocate in a fragmented heap, and conversely trees may cause heap fragmentation. Trees are also more vulnerable to inefficiencies in allocators.
• Compactness: Hash tables can have more compact storage for small value types, especially when the values are bits.
• Persistence: There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.
• Supporting new key types: Building a hash table requires a reasonable hash function for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys.

Sometimes simple implementations of one data structure or the other have disadvantages that can be overcome by better design. For example:

• Hash tables that use untrusted input as keys may be vulnerable to denial-of-service attacks where an untrusted user supplies data intended to generate large numbers of collisions. This may be overcome by choosing hash functions at random from a universal family, or by hashing untrusted input with a cryptographic hash function before insertion.
• Simple balanced trees waste space on pointers and allocation metadata; these problems can be mitigated by storing multiple elements in each node and by using memory pools.

Association lists

A simple but generally inefficient type of associative array is an association list, often called an alist for short, which simply stores a linked list of key-value pairs. Each lookup does a linear search through the list looking for a key match. Such a data structure is commonly used in Lisp/Scheme. Advantages of association lists include:

• It need only be known how to test keys for equality — which is minimal for maps supporting the four basic operations — while the above alternatives require a linear order comparison or a hash function.
• For small associative arrays, common in some applications, association lists can take less time and space than other data structures.
• Insertions are done in constant time by adding the new association to the head of the list.

Specialized representations

If the keys have a specific type, one can often use specialized data structures to gain performance. For example, integer-keyed maps can be implemented using radix trees or Judy arrays, and are useful space-saving replacements for sparse arrays. Because this type of data structure can perform longest-prefix matching, they're particularly useful in applications where a single value is assigned to most of a large range of keys with a common prefix except for a few exceptions, such as in routing tables. String-keyed maps can avoid extra comparisons during lookups by using tries.

Multimap

A variation of the map (associative array) is the multimap, which is the same as map data structures, but allows a key to be mapped to more than one value. Formally, a multimap can be thought of as a regular associative array that maps unique keys to nonempty sets of values, although actual implementation may vary. C++'s Standard Template Library provides the "multimap" container for the sorted multimap, SGI's STL provides the "hash_multimap" container, which implements a multimap using a hash table, and some varieties of LPC have built-in multimap support.

Language support

Associative arrays can be implemented in any programming language as a package and many language systems provide them as part of their standard library. In some languages, they are not only built into the standard system, but have special syntax, often using array-like subscripting. Built-in syntactic support for associative arrays was introduced by SNOBOL4, under the name "table".

MUMPS made multi-dimensional associative arrays, optionally persistent, its key data structure. SETL supported them as one possible implementation of sets and maps. Most modern scripting languages, starting with AWK and including Perl, Tcl, JavaScript, Python, Ruby, and Lua, support associative arrays as a primary container type. In many more languages, they are available as library functions without special syntax. Associative arrays have a variety of names.

In Smalltalk, Objective-C, .NET, Python, and REALbasic they are called dictionaries; in Perl and Ruby they are called hashes; in C++ and Java they are called maps (see map (C++) and Map) and in Common Lisp and Windows PowerShell they are called hashtables (since both typically use this implementation). In PHP all arrays can be associative, except that the keys are limited to integers and strings. In JavaScript, all objects behave as associative arrays.

In Visual FoxPro they are called Collections. In the scripting language Lua, associative arrays, called tables, are used as the primitive building block for all data structures, even arrays. In MUMPS, the associative arrays are typically stored as B-trees.