10 years of Apache Cassandra

A look at the development of Apache Cassandra from a small database open sourced out of Facebook into one of the top ten databases used worldwide.

Apache Cassandra, developed by Avinash Lakshman and Prashant Malik to try to solve their Inbox-search problem at Facebook, was published as free software under the Apache V2 license in July 2008. Providing a scalable, high-availability datastore with no single point of failure, Cassandra is well suited for high-availability applications. It supports multi-datacenter replication, and offers massive and linear scalability, so any number of nodes can easily be added to any Cassandra cluster in any datacenter. According to the website, the largest known Cassandra setup involves over 300TB of data on over 400 machines.

After ten years of development, driven in part by contributions from IBM, Twitter and Rackspace, Cassandra is now used by NetFlix, eBay, Twitter, Reddit and many others, and is one of the most popular NoSQL-databases in use today. To find out more about the impact Cassandra has had on the development community, we speak to previous Apache Cassandra project chair Jonathan Ellis, currently SVP and CTO, DataStax; Aaron Morton, CEO at Cassandra consultants, The Last Pickle; and open source consultant Carlos Rolo.


How has Cassandra impacted the community?

Jonathan Ellis, SVP and CTO, DataStax, first come into contact with Cassandra at the end of 2008, when he was hired by Rackspace to build them a next-generation, scalable database. He explains that Cassandra was one of a number of options at the time that offered ‘NoSQL’, but argues that SQL itself wasn’t the problem: “SQL is a quite reasonable language for getting data in and out of a server.”

The introduction of Cassandra Query Language (CQL) with Cassandra 1.1 in 2012 was one of the most important steps for the community, according to Ellis, because it meant developers had an API portable across languages and suitable for a REPL. “We were the first to introduce this,” Ellis explains, “with almost universal adoption of a similar approach by other NoSQL databases.” The only notable holdout today is Amazon’s DynamoDB, and Ellis doesn’t believe that will last – “I predict that it won’t be long before they follow suit as well.”

For Ellis, the biggest contribution Cassandra has made is that app developers – whether Cassandra users or not -- realized that you don’t need ACID for most common tasks. “Cassandra defaults to eventually consistent operations (where “eventually” is typically single-digit or even sub-millisecond latencies), and allows users to opt in to lightweight transactions when linearizable consistency is called for.”

To continue reading this article register now