Concepts - Hadoop

In this section, we provide a quick overview of core HBase concepts. At a minimum, a passing familiarity will ease the digestion of all that follows.

Whirlwind Tour of the Data Model

Applications store data into labeled tables. Tables are made of rows and columns. Table cells the intersection of row and column coordinates are versioned. By default, their version is a timestamp auto-assigned by HBase at the time of cell insertion. A cell’s content is an uninterpreted array of bytes.

Table row keys are also byte arrays, so theoretically anything can serve as a row key from strings to binary representations of long or even serialized data structures. Table rows are sorted by row key, the table’s primary key. The sort is byte-ordered. All table accesses are via the table primary key.

Row columns are grouped into column families. All column family members have a common prefix, so, for example, the columns temperature:air and temperature:dew_point are both members of the temperature column family, whereas station:identifier belongs to the station family.

The column family prefix must be composed of printable characters. The qualifying tail, the column family qualifier, can be made of any arbitrary bytes.

* For more detail than is provided here, see the HBase Architecture page on the HBase wiki.

As of this writing, there are at least two projects up on github that add secondary indices to HBase. In HBase, by convention, the colon character (:) delimits the column family from the column family qualifier. It is hardcoded.

A table’s column families must be specified up front as part of the table schema definition, but new column family members can be added on demand. For example, a new column station:address can be offered by a client as part of an update, and its value persisted, as long as the column family station is already in existence on the targeted table.

Physically, all column family members are stored together on the filesystem. So, though earlier we described HBase as a column-oriented store, it would be more accurate if it were described as a column-family-oriented store. Because tunings and storage specifications are done at the column family level, it is advised that all column family members have the same general access pattern and size characteristics.

In synopsis, HBase tables are like those in an RDBMS, only cells are versioned, rows are sorted, and columns can be added on the fly by the client as long as the column family they belong to preexists.

Regions

Tables are automatically partitioned horizontally by HBase into regions. Each region comprises a subset of a table’s rows. A region is denoted by the table it belongs to, its first row, inclusive, and last row, exclusive. Initially, a table comprises a single region, but as the size of the region grows, after it crosses a configurable size threshold, it splits at a row boundary into two new regions of approximately equal size. Until this firstsplit happens, all loading will be against the single server hosting the original region.

As the table grows, the number of its regions grows. Regions are the units that get distributed over an HBase cluster. In this way, a table that is too big for any one server can be carried by a cluster of servers with each node hosting a subset of the table’s total regions. This is also the means by which the loading on a table gets distributed. The online set of sorted regions comprises the table’s total content.

Locking

Row updates are atomic, no matter how many row columns constitute the row-level transaction. This keeps the locking model simple.

Implementation

Just as HDFS and MapReduce are built of clients, slaves, and a coordinating master namenode and datanodes in HDFS and jobtracker and tasktrackers in MapReduce so is HBase modeled with an HBase master node orchestrating a cluster of one or more regionserver slaves (see Figure ). The HBase master is responsible for bootstrapping a virgin install, for assigning regions to registered regionservers, and for recoveringregionserver failures. The master node is lightly loaded. The regionservers carry zero or more regions and field client read/write requests. They also manage region splits informing the HBase master about the new daughter regions for it to manage the offlining of parent region and assignment of the replacement daughters.

Implementation

HBase depends on ZooKeeper ( ZooKeeper Chapter ) and by default it manages a ZooKeeper instance as the authority on cluster state. HBase hosts vitals such as the location of the root catalog table and the address of the current cluster Master. Assignment of regions is mediated via ZooKeeper in case participating servers crash mid-assignment. Hosting the assignment transaction state in ZooKeeper makes it so recovery can pick up on the assignment at where the crashed server left off. At a minimum, bootstrapping a client connection to an HBase cluster, the client must be passed the location of the ZooKeeper ensemble. Thereafter, the client navigates the ZooKeeper hierarchy to learn cluster attributes such as server locations.

Regionserver slave nodes are listed in the HBase conf/regionservers file as you would list datanodes and tasktrackers in the Hadoop conf/slaves file. Start and stop scripts are like those in Hadoop using the same SSH-based running of remote commands mechanism.Cluster site-specific configuration is made in the HBase conf/hbase-site.xml and conf/hbase-env.sh files, which have the same format as that of their equivalents up inthe Hadoop parent project (see SettingUp aHadoop Cluster Chapter ).

Where there is commonality to be found, HBase directly uses or subclasses the parent Hadoop implementation, whether a service or type.

When this is not possible, HBase will follow the Hadoop model where it can. For example, HBase uses the Hadoop Configuration system so configuration files have the same format. What this means for you, theuser, is that you can leverage any Hadoop familiarity in your exploration of HBase. HBase deviates from this rule only when adding its specializations.

HBase persists data via the Hadoop filesystem API. Since there are multiple implementations of the filesystem interface one for the local filesystem, one for the KFS filesystem, Amazon’s S3, and HDFS (the Hadoop Distributed Filesystem) HBase can persist to any of these implementations. Most experience though has been had using HDFS, though by default, unless told otherwise, HBase writes the local filesystem. The local filesystem is fine for experimenting with your initial HBase install, but thereafter, usually the first configuration made in an HBase cluster involves pointing HBase at the HDFS cluster to use.

HBase in operation

HBase, internally, keeps special catalog tables named -ROOT- and .META. within which it maintains the current list, state, and location of all regions afloat on the cluster. The -ROOT- table holds the list of .META. table regions. The .META. table holds the list of all user-space regions. Entries in these tables are keyed using by region name where a region name is made of the table name the region belongs to, the region’s start row, its time of creation, and finally, an MD5 hash of all of the former (i.e., a hash of tablename, start row, and creation timestamp.)‖ Row keys, as noted previously, are sorted so finding the region that hosts a particular row is a matter of a lookup to find the first entry whose key is greater than or equal to that of the requested row key. As regions transition are split, disabled/enabled, deleted, redeployed by the region load balancer, or redeployed due to a regionserver crash the catalog tables are updated so the state of all regions on the cluster is kept current.

Fresh clients connect to the ZooKeeper cluster first to learn the location of -ROOT-. Clients consult -ROOT- to elicit the location of the .META. region whose scope covers that of the requested row. The client then does a lookup against the found .META. region to figure the hosting user-space region and its location. Thereafter, the client interacts directly with the hosting regionserver.

To save on having to make three round-trips per row operation, clients cache all they learn traversing -ROOT- and .META. caching locations as well as user-space region start Here is an example region name from the table TestTable whose start row is xyz: TestTable,xyz, 1279729913622.1b6e176fb8d8aa88fd4ab6bc80247ece. A comma delimits table name, start row, and timestamp.The name always ends in a period.

and stop rows so they can figure hosting regions themselves without having to go back to the .META. table. Clients continue to use the cached entry as they work until there is a fault. When this happens the region has moved the client consults the .META. again to learn the new location. If, in turn, the consulted .META. region has moved, then -ROOT- is reconsulted.

Writes arriving at a regionserver are first appended to a commit log and then are added to an in-memory memstore. When a memstore fills, its content is flushed to the filesystem.

The commit log is hosted on HDFS, so it remains available through a regionserver crash. When the master notices that a regionserver is no longer reachable, usually because the servers’s znode has expired in ZooKeeper, it splits the dead regionserver’s commit log by region. On reassignment, regions that were on the dead regionserver, before they open for business, will pick up their just-split file of not yet persisted edits and replay them to bring themselves up-to-date with the state they had just before the failure.

Reading, the region’s memstore is consulted first. If sufficient versions are found reading memstore alone, we return. Otherwise, flush files are consulted in order, from newest to oldest until versions sufficient to satisfy the query are found, or until we run out of flush files.

A background process compacts flush files once their number has broached a threshold, rewriting many files as one, because the fewer files a read consults, the more performant it will be. On compaction, versions beyond the schema configured maximum, deletes and expired cells are cleaned out. A separate process running in the regionserver monitors flush file sizes splitting the region when they grow in excess of the configuredmaximum.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Hadoop Topics