Cassandra Architecture - Apache Cassandra

What is Cassandra Architecture?

The project aim of Cassandra is to handle big data assignments across multiple nodes lacking any single point of failure. Cassandra has peer-to-peer distributed system through its nodes, and data is distributed among all the nodes in a cluster.

In a group all the nodes play the same role. Each node is independent and at the same time interconnected to other nodes.
Each node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster.
When a node goes down, read/write requests can be served from other nodes in the network.

Data Replication in Cassandra

In Cassandra, one or more of the nodes in a cluster act as replicas for a given piece of information. If it is detected that a number of the nodes responded with an out-of-date value, Cassandra will return the maximum recent value to the purchaser. After returning the latest value, Cassandra performs a read repair inside the background to update the stale values.

The subsequent figure suggests a schematic view of how Cassandra makes use of records replication a few of the nodes in a cluster to ensure no single point of failure.
data_replication

Note − Cassandra uses the Gossip Protocol in the background to allow the nodes to communicate with each other and detect any faulty nodes in the cluster.

Components of Cassandra

The key components of Cassandra are as follows −

  • Node − It is the place where data is stored.
  • Data center − It is a collection of related nodes.
  • Cluster − A cluster is a component that contains one or more data centers.
  • Commit log − The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log.
  • Mem-table − A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables.
  • SSTable − It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value.
  • Bloom filter − These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.

Cassandra Query Language

users can get admission to Cassandra through its nodes using Cassandra Query Language (CQL). CQL treats the database (Keyspace) as a box of tables. Programmers use cqlsh: a prompt to work with CQL or separate application language drivers.

clients approach any of the nodes for their examine-write operations. That node (coordinator) performs a proxy among the client and the nodes holding the records.

Write Operations

Each writes interest of nodes is captured by the commit logs written within the nodes. Later the records may be captured and stored inside the mem-table. Each time the mem table is complete, data can be written into the SStable data record. All writes are automatically partitioned and replicated at some stage in the cluster. Cassandra periodically consolidates the SSTables, discarding needless information.

Read Operations

At some stage in examine operations, Cassandra receives values from the mem-table and tests the bloom filter to find the appropriate SSTable that holds the required data.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Apache Cassandra Topics