Apache Flume Data Flow - Apache Flume

Explain the data flow in Apache Flume

The framework used for moving log data into HDFS is Apache Flume. The log servers generate events and log data and also have agents running on them, which receive the data. An immediate node, Collector, collects the data from the agents. Finally the data is aggregated and pushed into a centralized store such as HBase or HDFS. The data flow of Apache Flume is depicted as:

Apache Flume Data Flow

Multi-hop Flow

There can be multiple agents within a Apache Flume, and an event before reaching the final destination may travel through more than one agent. This is known as multi-hop flow.

Fan-out Flow

The dataflow from one source to multiple channels is known as fan-out flow. It is of two types −

  • Replicating − The data flow where the data will be replicated in all the configured channels.
  • Multiplexing − The data flow where the data will be sent to a selected channel which is mentioned in the header of the event.

Fan-in Flow

The data flow in which the data will be transferred from many sources to one channel is known as fan-in flow.

Failure Handling

For each event in Apache Flume, two transactions take place – one at sender side and the other at receiver side. The events are send by the sender to the receiver. On receiving the data, the receiver commits the transaction and a “received” signal is sent to the sender. On receiving the signal, the sender commits its transaction. The sender commits the transaction only on receiving the signal from receiver.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Apache Flume Topics