Almost unlimited scalability is an essential facet of cloud computing as it is offered by the Google App Engine or CloudFoundry. Insuring this feature leads to a trade-off with other nonfunctional aspects from enterprise computing like consistency. But why? The answer to that question is given by the so-called CAP theorem.
The CAP theorem was formulated by Prof. Eric A. Brewer during his keynote speech at the ACM Symposium on Principles of Distributed Computing back in 2000 – long before the advent of the term cloud computing. Hence it is also known as Brewer’s Theorem. Seth Gilbert and Nancy Lynch from MIT provided a formal proof two years later.
Was does the theorem say? The acronym CAP summarizes the terms Consistency, Availability and Partition Tolerance. In a distributed system …
- Consistency means that all nodes see the same data at the same time.
- Availability means a guarantee that every request receives a response about whether it was successful or failed
- Partition Tolerance means the system continues to operate despite arbitrary message loss.
Brewer discovered that a distributed system can satisfy any two of these guarantees at the same time, but not all three.
The following triangle demonstrates this:
The system properties C, A and P can be regarded as gradual quantities, i.e. the availability is high if the system has short response times, and low if the system has slow response times. With regard to consistency, the system has a consistent state in an instant (see ACID principle of relational database management systems) or after a certain time frame of inconsistency (BASE principle of NoSQL datastores).
Any given distributed system is located on one of the sides (CA), (CP) or (AP) of our triangle. In real world 24/7 applications high availability is always required. So it can be questioned if a (CP) system makes any sense and if our choice is only between (CA) and (AP).
Before applying the CAP theorem to cloud computing, I’d like to give a few examples of distributed systems that most of us will know already.
DNS – Domain Name System
DNS is an (AP) system. Availability is really high (do you remember the last downtime of your DNS server?), the same is true for the partition tolerance. If you ever had the chance of running a DNS server yourself, you will know that consistency is not always given at in instant. It can sometimes take days for an DNS entry to travel its ways to the root hierarchy and before being visible by all other nodes.
RDBMS – Relational Database Management Systems
Most of you will know relational database management systems like DB2 or Oracle. They especially ensure one thing: consistency, driven by the ACID principle. All operations are atomic, consistent, isolated and durable.
A RDBMS is a (CA) system. Availability and consistency of a single node is very high. Building clustered systems with data replication leads to a decrease of availability because a consistent transaction takes up more time. The more nodes you have to synchronize their data, the longer such transactions will take.
CAP Theorem and Cloud Computing
Cloud platforms rely on horizontal scaling (scale-out), i.e. the load is distributed over a lot of single nodes, which may run on cheap commodity hardware.
For that reason a cloud infrastructure has to cope with regualur outages of single nodes (partition tolerance). High availability is a must have, since the end user will not tolerate response times above a certain level any more these days before trying the next competitor. So a cloud platform (or at least large parts of it) is an (AP) system.
Because of the CAP theorem we can’t have ACID consistency in an (AP) system. Being not strictly consistent does not mean your data will be corrupt all the time. There are weaker concepts of consistency than ACID. But these concepts are often acceptable in your application.
The counterpart of the ACID principle of RDBMS is the BASE principle often found in NoSQL datastores. BASE means
- Soft State,
- Eventual consistency
Ensuring eventual consistency basically means that the system is not consistent at an instant at the end of a business transaction, but after a (preferably short) time of inconsistency, so that all nodes see the same data again in the end.
Let’s have a look at Twitter (or any other social network of your choice). You may have noticed that changes to your profile or follower lists do not show up at once (neither for you or other users). Without digging to deep into the architecture of Twitter one can assume that such updates are asychronously processed. But does it matter to the user? Not really – at the end the updates will show up.