CAP Theorem In Distributed System

CAP Theorem In Distributed System

  • avatar
    Name
    Meysam Hadeli
  • The CAP Theorem was introduced by Eric Brewer in 2000. It is a fundamental principle in distributed systems that describes the trade-off between consistency, availability, and partition tolerance. And in the distributed systems, we can only have two of three properties simultaneously.

    Let's take a look at more details for each specific property in CAP theorem:

    Consistency: In consistency, all nodes should get the same data simultaneously. When we write data in one node, it should be replicated across the other nodes in the system before the write is deemed "successful".

    Availability: In availability, the system should always be operational and ready to respond to the request (read or write). This means that the system will operate even if some nodes are down. and there is no guarantee that the response is the most recent write operation.

    Partition Tolerance: In partition tolerance, the system should continue to work where nodes cannot communicate with each other. This means that the system keeps running if the messages are failed or delayed between nodes, or some nodes are shot down.

    Not: The mean of nodes here is the replication of each database in a distributed system.

    Why should we choose two of three properties in CAP?

    According to the CAP concept, we can only choose two of three properties at the same time that are more important for our distributed system. We can choose between consistency and availability, but partition tolerance is a fundamental requirement in distributed systems because complete network reliability cannot be guaranteed. In a distributed system, it is vital that the system continue to work and provide services even when some nodes are down to communicate with each other.

    CA (Consistency, Availability): The CA database delivers consistency and availability across all nodes, but not fault tolerance. In any distributed system, network partition will happen, and using this type of database is a practical choice. A relational database like Postgres is a CA database, and it has a single node that can deliver consistency and availability without network partition. We can deploy them to nodes using replication. The real-world example of a CA system is e-commerce platforms like Amazon or eBay, which often employ a blend of consistency and availability.

    CP (Consistency, Partition Tolerance): The CP database delivers consistency and partition tolerance, but not availability. In these systems, when a network partition occurs, we have strong consistency, although we sacrifice availability. During partition, the system has to shut down the non-consistent nodes until the partition is resolved. A NoSQL database like Mongo is a CP database, and it has a primary node that handles all write requests and maintains consistency across the system. Secondary nodes replicate data from the primary node and keep up-to-date copies to ensure data consistency. The real-world example of a CP system is financial systems like banking applications, where strong consistency is very important.

    AP (Availability, Partition Tolerance): The AP database delivers availability and partition tolerance, but not consistency. During partition, all nodes are available and system provide responses to requests, although some of the responses are not up-to-date. When the partition in the end is resolved, most AP databases will sync the nodes to ensure consistency across them. A NoSQL database like Cassandra is an AP database, and it has no primary node, and all of the nodes remain available, although consistency across nodes might lag until the partition is resolved. The real-world example of an AP system is social media platforms like Facebook and Twitter, whose high availability is very important.

    Conclusion:

    The CAP theorem is one of the cornerstone concepts of distributed systems design, which posits that a system cannot be consistent, available, and partition-tolerant simultaneously. A distributed system can only provide two out of the three. This implies that you have to decide what is most important for your application when you develop a distributed system. For example, some systems may decide that even in the worst-case scenarios they will be available and partition-tolerant but not always give the latest data (consistency). While other systems might not be fully available during network issues, they still may be consistent. CAP theorem exegesis enables programmers to make the right choices in constructing dependable and scalable systems. It stresses the point that you need to be ready to sacrifice in one area to the benefit of another part of the system. Making the correct balance is an important factor in finding the best solution for your application.