Clients - Hadoop

There are a number of client options for interacting with an HBase cluster.


HBase, like Hadoop, is written in Java. Example shows how you would do in Java the shell operations listed previously at “Test Drive” .

Example . Basic table administration and access

This class has a main method only. For the sake of brevity, we do not include package name nor imports. In this class, we first create an instance of org.apache.hadoop.conf.Configuration. We ask the org.apache.hadoop.hbase.HBase

Configurationn class to create the instance. It will return a Configuration that has read HBase configuration from hbase-site.xml and hbase-default.xml files found on the program’s classpath. This Configuration is subsequently used to create instances of HBaseAdmin and HTable, two classes found in the org.apache.hadoop.hbase.client Java package. HBaseAdmin is used for administering your HBase cluster, for adding and dropping tables. HTable is used to access a specific table. The Configuration instance points these classes at the cluster the code is to work against.

To create a table, we need to first create an instance of HBaseAdmin and then ask it to create the table named test with a single column family named data. In our example, our table schema is the default. Use methods on org.apache.hadoop.hbase.HTableDe scriptor and org.apache.hadoop.hbase.HColumnDescriptor to change the table schema.

The code next asserts the table was actually created and then it moves to run operations against the just-created table.

Operating on a table, we will need an instance of org.apache.hadoop.hbase.cli ent.HTable passing it our Configuration instance and the name of the table we want to operate on. After creating an HTable, we then create an instance of org.apache.hadoop.hbase.client. Put to put a single cell value of value1 into a row named row1 on the column named data:1 (The column name is specified in two parts;the column family name as bytes databytes in the code above and then the column family qualifier specified as Bytes.toBytes("1")). Next we create an org.apache.hadoop.hbase.client.Get, do a get of the just-added cell, and then use anorg.apache.hadoop.hbase.client.Scan to scan over the table against the just-created table printing out what we find.

Finally, we clean up by first disabling the table and then deleting it. A table must be disabled before it can be dropped.


HBase classes and utilities in the org.apache.hadoop.hbase.mapreduce package facilitate using HBase as a source and/or sink in MapReduce jobs. The TableInputFormat class makes splits on region boundaries so maps are handed a single region to work on. TheTableOutputFormat will write the result of reduce into HBase. The RowCounter class in Example can be found in the HBase mapreduce package. It runs a map task to count rows using TableInputFormat.

Example . A MapReduce application to count the number of rows in an HBase table

This class uses GenericOptionsParser, which is discussed in “GenericOptionsParser, Tool, and ToolRunner” , for parsing command line arguments. The Row CounterMapper inner class implements the HBase TableMapper abstract, a specialization of org.apache.hadoop.mapreduce.Mapper that sets the map inputs types passed by TableInputFormat. The createSubmittableJob() method parses arguments added to the
configuration that were passed on the command line figuring the table and columns we are to run RowCounter against. The resultant parse is used configuring an instance of org.apache.hadoop.hbase.client.Scan, a scan object that will be passed through to TableInputFormat and used constraining what our Mapper sees. Notice how we set a filter, an instance of org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter, on the scan. This filter instructs the server short-circuit when running server-side doing no more than verify a row has an entry before returning. This speeds the row count. ThecreateSubmittableJob() method also invokes the TableMapReduceUtil.initTableMap Job() utility method, which among other things such as setting the map class to use, sets the input format to TableInputFormat. The map is simple. It checks for empty values. If empty, it doesn’t count the row. Otherwise, it increments Counters.ROWS by one.

Avro, REST, and Thrift

HBase ships with Avro, REST, and Thrift interfaces. These are useful when the interacting application is written in a language other than Java. In all cases, a Java server hosts an instance of the HBase client brokering application Avro, REST, and Thrift requests in and out of the HBase cluster. This extra work proxying requests and responses means these interfaces are slower than using the Java client directly.


To put up a stargate instance (stargate is the name for the HBase REST service), start it using the following command:

This will start a server instance, by default on port 8080, background it, and catch any emissions by the server in logfiles under the HBase logs directory. Clients can ask for the response to be formatted as JSON, Google’s protobufs, or as XML, depending on how the client HTTP Accept header is set. See the REST wiki page for documentation and examples of making REST client requests.

To stop the REST server, type:


Similarly, start a Thrift service by putting up a server to field Thrift clients by running the following:

This will start the server instance, by default on port 9090, background it, and catch any emissions by the server in logfiles under the HBase logs directory. The HBase Thrift documentation* notes the Thrift version used generating classes. The HBase Thrift IDL can be found at src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift in the HBase source code.

To stop the Thrift server, type:


The Avro server is started and stopped in the same manner as you’d start and stop the Thrift or REST services. The Avro server by default uses port 9090.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status

Hadoop Topics