Pulsar Terminology
Here is a glossary of terms related to Apache Pulsar:
Concepts​
Pulsar​
Pulsar is a distributed messaging system originally created by Yahoo but now under the stewardship of the Apache Software Foundation.
Message​
Messages are the basic unit of Pulsar. They're what producers publish to topics and what consumers then consume from topics.
Topic​
A named channel used to pass messages published by producers to consumers who process those messages.
Partitioned Topic​
A topic that is served by multiple Pulsar brokers, which enables higher throughput.
Namespace​
A grouping mechanism for related topics.
Namespace Bundle​
A virtual group of topics that belong to the same namespace. A namespace bundle is defined as a range between two 32-bit hashes, such as 0x00000000 and 0xffffffff.
Tenant​
An administrative unit for allocating capacity and enforcing an authentication/authorization scheme.
Subscription​
A lease on a topic established by a group of consumers. Pulsar has three subscription modes (exclusive, shared, and failover).
Pub-Sub​
A messaging pattern in which producer processes publish messages on topics that are then consumed (processed) by consumer processes.
Producer​
A process that publishes messages to a Pulsar topic.
Consumer​
A process that establishes a subscription to a Pulsar topic and processes messages published to that topic by producers.
Reader​
Pulsar readers are message processors much like Pulsar consumers but with two crucial differences:
- you can specify where on a topic readers begin processing messages (consumers always begin with the latest available unacked message);
- readers don't retain data or acknowledge messages.
Cursor​
The subscription position for a consumer.
Acknowledgment (ack)​
A message sent to a Pulsar broker by a consumer that a message has been successfully processed. An acknowledgement (ack) is Pulsar's way of knowing that the message can be deleted from the system; if no acknowledgement, then the message will be retained until it's processed.
Negative Acknowledgment (nack)​
When an application fails to process a particular message, it can send a "negative ack" to Pulsar to signal that the message should be replayed at a later timer. (By default, failed messages are replayed after a 1 minute delay)
Unacknowledged​
A message that has been delivered to a consumer for processing but not yet confirmed as processed by the consumer.
Retention Policy​
Size and/or time limits that you can set on a namespace to configure retention of messages that have already been acknowledged.
Multi-Tenancy​
The ability to isolate namespaces, specify quotas, and configure authentication and authorization on a per-tenant basis.
Architecture​
Standalone​
A lightweight Pulsar broker in which all components run in a single Java Virtual Machine (JVM) process. Standalone clusters can be run on a single machine and are useful for development purposes.
Cluster​
A set of Pulsar brokers and BookKeeper servers (aka bookies). Clusters can reside in different geographical regions and replicate messages to one another in a process called geo-replication.
Instance​
A group of Pulsar clusters that act together as a single unit.
Geo-Replication​
Replication of messages across Pulsar clusters, potentially in different datacenters or geographical regions.
Configuration Store​
Pulsar's configuration store (previously known as configuration store) is a ZooKeeper quorum that is used for configuration-specific tasks. A multi-cluster Pulsar installation requires just one configuration store across all clusters.
Topic Lookup​
A service provided by Pulsar brokers that enables connecting clients to automatically determine which Pulsar cluster is responsible for a topic (and thus where message traffic for the topic needs to be routed).
Service Discovery​
A mechanism provided by Pulsar that enables connecting clients to use just a single URL to interact with all the brokers in a cluster.
Broker​
A stateless component of Pulsar clusters that runs two other components: an HTTP server exposing a REST interface for administration and topic lookup and a dispatcher that handles all message transfers. Pulsar clusters typically consist of multiple brokers.
Dispatcher​
An asynchronous TCP server used for all data transfers in-and-out a Pulsar broker. The Pulsar dispatcher uses a custom binary protocol for all communications.
Storage​
BookKeeper​
Apache BookKeeper is a scalable, low-latency persistent log storage service that Pulsar uses to store data.
Bookie​
Bookie is the name of an individual BookKeeper server. It is effectively the storage server of Pulsar.
Ledger​
An append-only data structure in BookKeeper that is used to persistently store messages in Pulsar topics.
Functions​
Pulsar Functions are lightweight functions that can consume messages from Pulsar topics, apply custom processing logic, and, if desired, publish results to topics.