Apache Tajo Architecture - Apache Tajo

Describe the architecture of Apache Tajo

An illustration of the Apache Tajo is explained with the help of following diagram:

Apache Tajo Architecture

Each component of the above diagram are described in the below table.

S.No.
Component & Description
1
Client
Clientsubmits the SQL statements to the Tajo Master to get the result.
2
Master
Master is the main daemon. It is responsible for query planning and is the coordinator for workers.
3
Catalog server
Maintains the table and index descriptions. It is embedded in the Master daemon. The catalog server uses Apache Derby as the storage layer and connects via JDBC client.
4
Worker
Master node assigns task to worker nodes. TajoWorker processes data. As the number of TajoWorkers increases, the processing capacity also increases linearly.
5
Query Master
Query is assigned to Query Master by Tajo master. The Query Master is responsible for controlling a distributed execution plan. The main role of the Query Master is to monitor the running tasks and report them to the Master node.
6
Node Managers
Manages the resource of the worker node. It decides on allocating requests to the node.
7
TaskRunner
Acts as a local query execution engine. It is used to run and monitor query process. The TaskRunner processes one task at a time.
It has the following three main attributes −
  • Logical plan − An execution block which created the task.
  • A fragment − an input path, an offset range, and schema.
  • Fetches URIs
8
Query Executor
It is used to execute a query.
9
Storage service
Connects the underlying data storage to Tajo.

Explain the Workflow of Apache Tajo

Hadoop Distributed File System (HDFS) is used by Tajo as the storage layer having its own query execution engine instead of the MapReduce framework. A Tajo cluster consists of one master node and a number of workers across cluster nodes.

Query planning and workers coordination is taken care by the master. A query is divided into small tasks by the master and is assigned to workers. Each worker possesses a local query engine to execute a directed acyclic graph of physical operators.

Apart from these, the distributed data flow is controlled more flexibly by Tajo than that of MapReduce and indexing techniques are supported by Tajo.

The web-based interface of Tajo has the following capabilities −

  • Option to find how the submitted queries are planned
  • Option to find how the queries are distributed across nodes
  • Option to check the status of the cluster and nodes

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Apache Tajo Topics