Apache Storm Cluster Architecture - Apache Storm

What is Apache Storm Cluster Architecture?

One of the main highlight of the Apache Storm is that it is a fault-tolerant, fast with no “Single Point of Failure” (SPOF) distributed application. We can install Apache Storm in as many systems as needed to increase the capacity of the application.

Let’s have a look at how the Apache Storm cluster is designed and its internal architecture. The following diagram depicts the cluster design.

Apache Storm - Cluster Architecture


Apache storm has type of nodes, Nimbus (master node) and supervisor (worker node). Nimbus is the central component of Apache storm. the main job of Nimbus is to run the storm topology. Nimbus analyzes the topology and gathers the task to be finished. Then, it will distributes the project to an available supervisor.

A supervisor will have one or more worker process. supervisor will delegate the tasks to worker processes. worker process will spawn as many executors as needed and run the project. Apache storm uses an internal distributed messaging system for the communication between nimbus and supervisors.

Components

Description

Nimbus

Nimbus is a master node of Storm cluster. All other nodes in the cluster are called asworker nodes. Master node is responsible for distributing data among all the worker nodes, assign tasks to worker nodes and monitoring failures.

Supervisor

The nodes that follow instructions given by the nimbus are called as Supervisors. Asupervisorhas multiple worker processes and it governs worker processes to complete the tasks assigned by the nimbus.

Worker process

A worker process will execute tasks related to a specific topology. A worker process will not run a task by itself, instead it createsexecutorsand asks them to perform a particular task. A worker process will have multiple executors.

Executor

An executor is nothing but a single thread spawn by a worker process. An executor runs one or more tasks but only for a specific spout or bolt.

Task

A task performs actual data processing. So, it is either a spout or a bolt.

ZooKeeper framework

Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between themselves and maintaining shared data with robust synchronization techniques. Nimbus is stateless, so it depends on ZooKeeper to monitor the working node status.

ZooKeeper helps the supervisor to interact with the nimbus. It is responsible to maintain the state of nimbus and supervisor.

Storm is stateless in nature. Even though stateless nature has its own disadvantages, it definitely helps storm to method real-time data in the best viable and fastest manner.

Storm is not completely stateless though. It stores its state in Apache Zookeeper. Since the state is available in Apache Zookeeper, a failed nimbus can be restarted and made to work from where it left. Usually, service monitoring tools like monit will monitor Nimbus and restart it if there is any failure.

Apache storm also have an advanced topology called Trident Topology with state maintenance and it also offers an excessive-level API like Pig. We can discuss all these functions in the coming chapters.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Apache Storm Topics