Microservices architecture has become very popular. However, one common problem is how to manage distributed transactions across multiple microservices. In this post we talk about the distributed transactions in microservices, we are going to show the solutions and patterns for this problem in distributed systems.
In Microservices Architecture each service has its own local database so in a for performing a distributed operation in our microservice, our operation split to some small operation that they are served with different microservice. so transactions in these systems distribute on different microservices that it is different from the transaction in monolithic systems because in monolithic systems all operations performed on a transaction on a different table in a unique database.
There are some solutions and patterns for handling the distributed transaction in microservice that we mention some of them in this article.
A two-phase commit is a standardized protocol that ensures that a database commit is done properly (atomic) over multiple resources. In a two-phase commit, we have a controlling node that houses most of the logic, and we have a few participating nodes on which the actions would be performed.
Any transaction made has to go through a transaction manager. This manager ensures that a transaction occurs across all the services (resources). As the name suggests, there are two phases in this mechanism – Commit Request Phase and Commit Phase.
In the first phase of the protocol, called a commit-request phase or a voting phase, a coordinator (normally the process that initiated the commit) gets the approval or rejection to commit the data changes of involved processes. Only if participants/subscribers approve it the coordinator decides to commit, otherwise, it decides to abort.
In the second phase (a commit phase) the coordinator informs the participants about the result. According to the result, either the entire transaction is rolled back or all the subtransactions are successfully completed and the temporarily locked resources are released.
During the two phases, the following messages are exchanged between the coordinator and the participants:
Commit Request Phase
- The coordinator sends a query to commit a message to all participants and waits for their answers.
- Participants process the transaction and either commit it or do a rollback. They write the entries in their undo and redo log.
- Participants respond to the coordinator with “ready” if the transaction was successful, or with “failed” if the transaction failed.
- If the coordinator receives a “ready” message from all the participants:
- The coordinator sends commit to all participants.
- Participants can complete the transaction with a commit and release all locks and resources.
- Participants send back an acknowledgment.
- The coordinator completes the transaction if it receives an acknowledgment from all the participants.
- In case one of the participants responds with a “failed” message
- The coordinator sends abort to all participants.
- Participants terminate the transaction with a rollback (using the undo log) and release all locks and resources.
- Participants send an acknowledgment to the coordinator
- The coordinator will rollback the transaction after receiving an acknowledgment from all the participants.
The Advantage of Two-Phase Commit
- Guarantees Consistency
- Easy to understand
- Centralized distributed transaction management
The Disadvantage of Two-Phase Commit
- The transaction manager is a single point of failure. If it fails, then the services will never be able to resolve their transactions.
- This protocol is a blocking protocol. The manager will block till it receives the messages from the services which would imply that the locks and resources held by it will not be released till it receives a reply.
- not really recommended for many microservice-based systems because 2pc is synchronous (blocking)
- Reduce throughput due to locks
- Not support by many NoSQL databases or message brokers
- Due to CAP Theorem 2pc impacts on the availability
2. SAGA Pattern
The Saga pattern is another widely used pattern for distributed transactions. It is different from 2pc, which is synchronous. The Saga pattern is asynchronous and reactive.
Implement each business transaction that spans multiple services as a saga. A saga is a sequence of local transactions. Each local transaction updates the database and publishes a message or event to trigger the next local transaction in the saga. If a local transaction fails because it violates a business rule then the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.
There are two ways of coordination sagas:
- Choreography – each local transaction publishes domain events that trigger local transactions in other services
- Orchestration – an orchestrator (object) tells the participants what local transactions to execute
Events Choreography Approach for SAGA
in the Event Choreography approach, we use the event for handling a business transaction.
In the Events/Choreography approach, the first service executes a transaction and then publishes an event. This event is listened by one or more services that execute local transactions and publish (or not) new events.
The distributed transaction ends when the last service executes its local transaction and does not publish any events or the event published is not heard by any of the saga’s participants.
In our sample when a user sends a
BookingVacation command to BookingVacation service these steps will occur :
- BookingVacation service sends a
BookFlightcommand for booking a flight to Flights Service through the message broker and flight service the handler does some business logic for book a flight.
- After booking a flight successfully, flights Sevice publish an event
FlightBookedand other services could subscribe to this event for the next step in this example it would be Hotel Service and within this service create an event handler for handling this event and in this event, the handler does some business logic for book a hotel.
- After booking a hotel successfully, Hotel Service publishes an event
HotelBookedand car microservice that subscribe to this event handle this event with its Event Handler does some business logic for renting the cars.
- After renting cars successfully Cars Service publishes an event
CarRentedand API subscribed to this event and set the state of our booking vacation to Succeded.
If the state of the booking vacation needs to be tracked, BookingVacation Service could simply listen to all events and update its state.
In order to handle failure, we have to implement some compensation logic for the role back each committed logic for our microservices that collaborate in the distributed transaction.
the issue is that for example in the last step we failed in our distributed transaction and Cars Service publishes an event
CarRentalRejectedto message broker and because we are inner a distributed transaction we need to roll back the state before we actually started the transaction for this purpose we need to create some compensation logic.
we create an event handler for CarRentalReject in Hotel and Flight service as compensation logic for canceling flight and hotel also BookingVacation Service handles this rejected event and updates state of this booking to fail.
Advantages of Event Choreography
- It is simple, easy to understand
- Fit into Event-Driven Architecture
- Suitable for transaction involves small steps
- No extra code in term of locking and unlocking resources for services
- Does not require much effort to build
- Based on Asynchronous operations
- All participants are loosely coupled as they don’t have direct knowledge of each other
Disadvantages of Event Choreography
- Cyclic dependency because services listen to each other events
- Difficult to track which services listen to which events
- Domain Object has to know what’s going on and know much about across all of the services that are part of this transaction
- Testing would be painful because trying to test whether the whole transaction works and need to all services started and it would be really hard to track messages and check whether the sequence is correct or not
Orchestration Approach for SAGA
A saga orchestrator is a persistent object that tracks the state of the saga and invokes the participants. The saga orchestrator communicates with each service in a command/reply style telling them what operation should be performed.
Kind of more the behavior of saga orchestrator is when it created to tell the saga participants what to do then a reply comes back at which point it figures out that which is the next saga participant to invoke next and based on whether that transaction was successful or not and then it tells the saga participants sends a message and then it updates it sate which is persisted in the database and it just goes around that loop until it runs out of things to do which point presumably it is done.
In our sample when a user sends a
BookingVacation command to Vacation Service and the following steps will occur :
- Vacation Service send
BookingVacationcommand to BookingVacation Saga to orchestrate and start the transaction
- Saga sends a
BookFlightcommand to flights Service through the message broker for flights service and flight service process this command
- flights service process the BookFlight command and replies with a
FlightBookedmessage in the case of an event to the message broker
- Saga aware of the whole process and it handles FlightBooked event with appropriate event handler and publishes the next command which is
BookHotelcommand to the message broker
- BookHotel command processes by hotel service and after a while it creates a reply message in the case of an event after that Hotel Service publishes a
HotelBookedevent to the message broker
- Saga gets the reply and handles this event and publishes the last command which is
RentCarcommand for car service to the message broker
- Car service get RentCar command and process this command and send a
CarRentedevent for Saga to the message broker
- Saga gets CarRented reply and transaction will finish by Saaga
Handling Failer in Orchestration Approach
In the scenario that we fail in Rent a car, car service publish a reply message
CarRentalRejected in the case of an event and Saga get a reply message and handle the event and Saga process for compensating operations and publish command to hotel service with a
CancelHotelBook command for canceling booking hotel and another command to flight service for canceling the flight. these compensation commands act as a role back transaction and all service back to our state before the transaction and we can start a new transaction.
Advantages of Orchestration
- Avoid cyclic dependencies between services, as the saga orchestrator invokes the saga participants but the participants do not invoke the orchestrator
- Centralize the orchestration of the distributed transaction (coordination logic) and easier to understand
- Reduce the participant’s complexity as they only need to execute/reply commands.
- Easier implementing and testing
- The transaction complexity remains linear when new steps will add
- Rollbacks are easier to manage
- Reduce Coupling because domain objects don’t need too much knowledge about other domains and all of them placed in orchestrator
Disadvantages of Orchestration
- concentrating too much logic in the orchestrator