How transactions work?
This section describes transaction components and how the components work together. For the complete design details, see PIP-31: Transactional Streaming.
Key concept
It is important to know the following key concepts, which is a prerequisite for understanding how transactions work.
Transaction coordinator
The transaction coordinator (TC) is a module running inside a Pulsar broker.
-
It maintains the entire life cycle of transactions and prevents a transaction from getting into an incorrect status.
-
It handles transaction timeout, and ensures that the transaction is aborted after a transaction timeout.
Transaction log
All the transaction metadata persists in the transaction log. The transaction log is backed by a Pulsar topic. If the transaction coordinator crashes, it can restore the transaction metadata from the transaction log.
The transaction log stores the transaction status rather than actual messages in the transaction (the actual messages are stored in the actual topic partitions).
Transaction buffer
Messages produced to a topic partition within a transaction are stored in the transaction buffer (TB) of that topic partition. The messages in the transaction buffer are not visible to consumers until the transactions are committed. The messages in the transaction buffer are discarded when the transactions are aborted.
Transaction buffer stores all ongoing and aborted transactions in memory. All messages are sent to the actual partitioned Pulsar topics. After transactions are committed, the messages in the transaction buffer are materialized (visible) to consumers. When the transactions are aborted, the messages in the transaction buffer are discarded.
Transaction ID
Transaction ID (TxnID) identifies a unique transaction in Pulsar. The transaction ID is 128-bit. The highest 16 bits are reserved for the ID of the transaction coordinator, and the remaining bits are used for monotonically increasing numbers in each transaction coordinator. It is easy to locate the transaction crash with the TxnID.
Pending acknowledge state
Pending acknowledge state maintains message acknowledgments within a transaction before a transaction completes. If a message is in the pending acknowledge state, the message cannot be acknowledged by other transactions until the message is removed from the pending acknowledge state.
The pending acknowledge state is persisted in the pending acknowledge log (cursor ledger). A new broker can restore the state from the pending acknowledge log to ensure the acknowledgment is not lost.
Data flow
At a high level, the data flow can be split into several steps:
-
Begin a transaction.
-
Publish messages with a transaction.
-
Acknowledge messages with a transaction.
-
End a transaction.
To help you debug or tune the transaction for better performance, review the following diagrams and descriptions.
1. Begin a transaction
Before introducing the transaction in Pulsar, a producer is created and then messages are sent to brokers and stored in data logs.
Let’s walk through the steps for beginning a transaction.
Step | Description |
---|---|
1.1 | The first step is that the Pulsar client finds the transaction coordinator. |
1.2 | The transaction coordinator allocates a transaction ID for the transaction. In the transaction log, the transaction is logged with its transaction ID and status (OPEN), which ensures the transaction status is persisted regardless of transaction coordinator crashes. |
1.3 | The transaction log sends the result of persisting the transaction ID to the transaction coordinator. |
1.4 | After the transaction status entry is logged, the transaction coordinator brings the transaction ID back to the Pulsar client. |
2. Publish messages with a transaction
In this stage, the Pulsar client enters a transaction loop, repeating the consume-process-produce
operation for all the messages that comprise the transaction. This is a long phase and is potentially composed of multiple produce and acknowledgment requests.
Let’s walk through the steps for publishing messages with a transaction.
Step | Description |
---|---|
2.1.1 | Before the Pulsar client produces messages to a new topic partition, it sends a request to the transaction coordinator to add the partition to the transaction. |
2.1.2 | The transaction coordinator logs the partition changes of the transaction into the transaction log for durability, which ensures the transaction coordinator knows all the partitions that a transaction is handling. The transaction coordinator can commit or abort changes on each partition at the end-partition phase. |
2.1.3 | The transaction log sends the result of logging the new partition (used for producing messages) to the transaction coordinator. |
2.1.4 | The transaction coordinator sends the result of adding a new produced partition to the transaction. |
2.2.1 | The Pulsar client starts producing messages to partitions. The flow of this part is the same as the normal flow of producing messages except that the batch of messages produced by a transaction contains transaction IDs. |
2.2.2 | The broker writes messages to a partition. |
3. Acknowledge messages with a transaction
In this phase, the Pulsar client sends a request to the transaction coordinator and a new subscription is acknowledged as a part of a transaction.
Let’s walk through the steps for acknowledging messages with a transaction.
Step | Description |
---|---|
3.1.1 | The Pulsar client sends a request to add an acknowledged subscription to the transaction coordinator. |
3.1.2 | The transaction coordinator logs the addition of subscription, which ensures that it knows all subscriptions handled by a transaction and can commit or abort changes on each subscription at the end phase. |
3.1.3 | The transaction log sends the result of logging the new partition (used for acknowledging messages) to the transaction coordinator. |
3.1.4 | The transaction coordinator sends the result of adding the new acknowledged partition to the transaction. |
3.2 | The Pulsar client acknowledges messages on the subscription. The flow of this part is the same as the normal flow of acknowledging messages except that the acknowledged request carries a transaction ID. |
3.3 | The broker receiving the acknowledgment request checks if the acknowledgment belongs to a transaction or not. |
4. End a transaction
At the end of a transaction, the Pulsar client decides to commit or abort the transaction. The transaction can be aborted when a conflict is detected in acknowledging messages.
4.1 End transaction request
When the Pulsar client finishes a transaction, it issues an end transaction request.
Let’s walk through the steps for ending the transaction.
Step | Description |
---|---|
4.1.1 | The Pulsar client issues an end transaction request (with a field indicating whether the transaction is to be committed or aborted) to the transaction coordinator. |
4.1.2 | The transaction coordinator writes a COMMITTING or ABORTING message to its transaction log. |
4.1.3 | The transaction log sends the result of logging the committing or aborting status. |
4.2 Finalize a transaction
The transaction coordinator starts the process of committing or aborting messages to all the partitions involved in this transaction.
Let’s walk through the steps for finalizing a transaction.
Step | Description |
---|---|
4.2.1 | The transaction coordinator commits transactions on subscriptions and commits transactions on partitions at the same time. |
4.2.2 | The broker (produce) writes produced committed markers to the actual partitions. At the same time, the broker (ack) writes acked committed marks to the subscription pending ack partitions. |
4.2.3 | The data log sends the result of writing produced committed marks to the broker. At the same time, pending ack data log sends the result of writing acked committed marks to the broker. The cursor moves to the next position. |
4.3 Mark a transaction as COMMITTED or ABORTED
The transaction coordinator writes the final transaction status to the transaction log to complete the transaction.
Let’s walk through the steps for marking a transaction as COMMITTED or ABORTED.
Step | Description |
---|---|
4.3.1 | After all produced messages and acknowledgments to all partitions involved in this transaction have been successfully committed or aborted, the transaction coordinator writes the final COMMITTED or ABORTED transaction status messages to its transaction log, indicating that the transaction is complete. All the messages associated with the transaction in its transaction log can be safely removed. |
4.3.2 | The transaction log sends the result of the committed transaction to the transaction coordinator. |
4.3.3 | The transaction coordinator sends the result of the committed transaction to the Pulsar client. |