2 Phase Commit with and without failures

1. Introduction

In distributed systems, a transaction is a sequence of operations across multiple, independently managed resources (databases, services, etc.) that must succeed or fail atomically—either all updates are applied, or none are. Achieving atomicity in a single process is straightforward (local locking, write‐ahead logs), but across distributed participants it becomes challenging:

  • Partial failures (node crashes, network partitions) can leave some participants updated and others not, violating atomicity.
  • Coordination requires consensus on the final outcome (commit vs. abort).

A commit protocol is a coordination algorithm ensuring that either all participants commit or all abort, even in the face of failures.


2. Why Not One-Phase Commit?

A naïve “one-phase commit” simply has the coordinator instruct each participant:

  1. “Commit your changes.”
  2. Participants do so and reply “Done.”

Problem: If the coordinator crashes after sending some “Commit” messages but before others, some participants commit while others remain uncoordinated, violating atomicity. There is no mechanism for participants to recover the global decision without re-contacting the (possibly failed) coordinator.


3. Two-Phase Commit Overview

The Two-Phase Commit (2PC) protocol solves this by splitting consensus into two phases, with durable logging at each participant:

Here's the reformatted table with improved alignment and readability:

3. Two-Phase Commit Overview

The Two-Phase Commit (2PC) protocol solves this by splitting consensus into two phases, with durable logging at each participant:

Phase Coordinator Action Participant Action
1: Prepare (Voting) Send PREPARE (can you commit?) On PREPARE:
1. Durably log "READY"
2. Reply VOTE_COMMIT (if able) or VOTE_ABORT
2: Commit/Abort (Decision) If all votes = commit:
- Log "COMMIT"
- Send GLOBAL_COMMIT
Otherwise:
- Log "ABORT"
- Send GLOBAL_ABORT
On decision:
1. Log decision
2. Execute commit/abort
3. Send ACK to coordinator

Key Mechanisms:

  • Durable logging at every step lets crashed nodes recover their last stable state.
  • Timeouts: participants and coordinator use timers to detect lost messages/failures and take recovery actions.
  • Blocking: if the coordinator dies mid-decision, participants in “READY” wait (blocked) until they learn the global decision.

4. Detailed Protocol States

Coordinator States

  1. INIT: before transaction start.
  2. WAIT_VOTES: after sending PREPARE, awaiting all votes.
  3. DECIDED: made decision (commit or abort), sending global decision.
  4. COMPLETED: after receiving all ACKs.

Participant States

  1. INIT: before PREPARE.
  2. READY: voted commit, waiting for decision.
  3. COMMITTING / ABORTING: upon receiving global decision.
  4. DONE: after sending ACK.

5. Failure Tolerance and Recovery

5.1 Coordinator Crash

  • Before Phase 2 (after voting): participants in READY log “READY” but have no decision → blocking.
  • After Phase 2: some participants may get GLOBAL_COMMIT while others don’t. On recovery, the coordinator re-reads its log and re-sends the decision to stragglers.

5.2 Participant Crash

  • Before Vote: coordinator times out → decides abort (presumed abort).
  • After Vote_COMMIT: participant, on recovery, reads “READY” in its log and contacts coordinator for the final decision.
  • After Global Decision, Before ACK: coordinator times out waiting for ACK and re-sends decision; participant idempotently processes and responds.

5.3 Network Failures

  • Message Loss: timeouts trigger retransmission.
  • Delays: participants remain in their logged state until a valid message arrives or timeout elapses.
  • Reordering: sequence numbers or protocol state prevent stale messages from being mis-interpreted.

6. Blocking and Optimizations

Blocking Nature

Participants in READY cannot unilaterally decide to abort—they risk violating atomicity if the global decision was commit. This blocking is a well-known drawback of 2PC under coordinator failure.

Common Optimizations

  • Presumed Abort: skip logging “ABORT” to reduce I/O; if no record of commit exists, treat as abort on recovery.
  • Presumed Commit: skip logging “COMMIT” (less common); trade-off assumes most transactions commit.
  • Three-Phase Commit (3PC): adds a pre-commit phase to avoid blocking, at the cost of more messages.

7. Practical Considerations

  • Logging Overhead: every state transition must be durably logged; batching or group commit can amortize cost.
  • Timeout Tuning: must balance sensitivity to failures vs. false timeouts under high latency.
  • Scalability: 2PC’s message complexity is O(N) per phase; for many participants, coordinator becomes a bottleneck.
  • Coordinator High-Availability: real systems often replicate the coordinator (e.g., via Paxos) to reduce blocking.