Cassandra vs. Hbase

Cassandra is a free, open source NoSQL database that is designed to manage huge data sets across large clusters. The architecture of Cassandra achieves availability and partition tolerance according to the CAP theorem which lists the three requirements when designing application for a distributed architecture. Cassandra was influenced by Google’s Bigtable, which influenced its data architecture. It also inherited its distribution mechanism from Amazon’s Dynamo, where nodes in a Cassandra cluster are completely symmetrical and all have identical responsibilities. Cassandra also uses Dynamo’s consistent hashing for partitioning and data replication.

Hbase is described as the Hadoop database. Hadoop is a name for an entire ecosystem of technologies and Hbase uses some of them to create a distributed, column-oriented database built on the same architecture as Google’s Bigtable. Hbase can achieve high levels of scalability, high reliability and flexibility from a column-oriented database. Hbase requires tables and columns to be defined in advance but new ones can be added in the process. Hbase is designed to support queries of large data sets and it is optimized for reads. Hbase seeks to maintain consistency for writes. Hbase achieves consistency and partition tolerance on the CAP theorem requirements.

Though they are based on the principles of Google’s Bigtable, there is a difference huge difference between Cassandra and Hbase. Nodes in Cassandra and Hbase are symmetrical meaning users can link to any node in the cluster. However, Cassandra requires the user to identify some nodes as seed nodes. The seed nodes serve as concentration points for inter-cluster communication. Hbase requires the user to convert some of the nodes into serving as master nodes, whose function is to monitor and co-ordinate actions of the region servers. Cassandra guarantees high availability by allowing numerous seed nodes in a cluster while Hbase guarantees the same through standby master nodes. Standby master nodes become the new master nodes on failure of the current master node.

In Cassandra the Gossip protocol is used for inter-node communications. The Gossip services are integrated into the Cassandra software. Hbase relies on Zookeeper which is a seperate distributed application for handling corresponding tasks. Hbase is shipped with the Zookeper installation but the user can also use pre-existing Zookeper. Neither Cassandra or Hbase can support real transactions but they provide some level of consistency. Hbase has a strong record-level consistency i.e. row-level. Cassandra has eventual consistency, which means both read and write can be adjusted not by level only but also extent.


No Comment

Comments are closed.