Architecture Overview
At the highest level, a Pulsar instance is composed of one or more Pulsar clusters. Clusters within an instance can replicate data amongst themselves.
In a Pulsar cluster:
- One or more brokers handles and load balances incoming messages from producers, dispatches messages to consumers, communicates with the Pulsar configuration store to handle various coordination tasks, stores messages in BookKeeper instances (aka bookies), relies on a cluster-specific ZooKeeper cluster for certain tasks, and more.
- A BookKeeper cluster consisting of one or more bookies handles persistent storage of messages.
- A ZooKeeper cluster specific to that cluster handles coordination tasks between Pulsar clusters.
The diagram below provides an illustration of a Pulsar cluster:
At the broader instance level, an instance-wide ZooKeeper cluster called the configuration store handles coordination tasks involving multiple clusters, for example geo-replication.
Brokers
The Pulsar message broker is a stateless component that's primarily responsible for running two other components:
- An HTTP server that exposes a REST API for both administrative tasks and topic lookup for producers and consumers. The producers connect to the brokers to publish messages and the consumers connect to the brokers to consume the messages.
- A dispatcher, which is an asynchronous TCP server over a custom binary protocol used for all data transfers
Messages are typically dispatched out of a managed ledger cache for the sake of performance, unless the backlog exceeds the cache size. If the backlog grows too large for the cache, the broker will start reading entries from BookKeeper.
Finally, to support geo-replication on global topics, the broker manages replicators that tail the entries published in the local region and republish them to the remote region using the Pulsar Java client library.
For a guide to managing Pulsar brokers, see the brokers guide.
Clusters
A Pulsar instance consists of one or more Pulsar clusters. Clusters, in turn, consist of:
- One or more Pulsar brokers
- A ZooKeeper quorum used for cluster-level configuration and coordination
- An ensemble of bookies used for persistent storage of messages
Clusters can replicate amongst themselves using geo-replication.
For a guide to managing Pulsar clusters, see the clusters guide.
Metadata store
The Pulsar metadata store maintains all the metadata of a Pulsar cluster, such as topic metadata, schema, broker load data, and so on. Pulsar uses Apache ZooKeeper for metadata storage, cluster configuration, and coordination. The Pulsar metadata store can be deployed on a separate ZooKeeper cluster or deployed on an existing ZooKeeper cluster. You can use one ZooKeeper cluster for both Pulsar metadata store and BookKeeper metadata store. If you want to deploy Pulsar brokers connected to an existing BookKeeper cluster, you need to deploy separate ZooKeeper clusters for Pulsar metadata store and BookKeeper metadata store respectively.
Pulsar also supports more metadata backend services, including ETCD and RocksDB (for standalone Pulsar only).
In a Pulsar instance:
- A configuration store quorum stores configuration for tenants, namespaces, and other entities that need to be globally consistent.
- Each cluster has its own local ZooKeeper ensemble that stores cluster-specific configuration and coordination such as which brokers are responsible for which topics as well as ownership metadata, broker load reports, BookKeeper ledger metadata, and more.
Configuration store
The configuration store maintains all the configurations of a Pulsar instance, such as clusters, tenants, namespaces, partitioned topic related configurations, and so on. A Pulsar instance can have a single local cluster, multiple local clusters, or multiple cross-region clusters. Consequently, the configuration store can share the configurations across multiple clusters under a Pulsar instance. The configuration store can be deployed on a separate ZooKeeper cluster or deployed on an existing ZooKeeper cluster.