Deploying a Pulsar instance
A Pulsar instance consists of multiple Pulsar clusters working in unison. Clusters can be distributed across data centers or geographical regions and can replicate amongst themselves using geo-replication. Deploying a multi-cluster Pulsar instance involves the following basic steps:
- Deploying two separate ZooKeeper quorums: a local quorum for each cluster in the instance and a global quorum for instance-wide tasks
- Initializing cluster metadata for each cluster
- Deploying a BookKeeper cluster of bookies in each Pulsar cluster
- Deploying brokers in each Pulsar cluster
If you’re deploying a single Pulsar cluster, see the Clusters and Brokers guide.
Running Pulsar locally or on Kubernetes?
This guide shows you how to deploy Pulsar in production in a non-Kubernetes. If you’d like to run a standalone Pulsar cluster on a single machine for development purposes, see the Setting up a local cluster guide. If you’re looking to run Pulsar on Kubernetes, see the Pulsar on Kubernetes guide, which includes sections on running Pulsar on Kubernetes on Google Container Engine and on Amazon Web Services.
Pulsar is currently available for MacOS and Linux. In order to use Pulsar, you’ll need to install Java 8.
To get started running Pulsar, download a binary tarball release in one of the following ways:
by clicking one of these links, which will automatically trigger a download:
- from the Pulsar downloads page
- from the Pulsar releases page
# Source release $ wget http://archive.apache.org/dist/incubator/pulsar/pulsar-1.20.0-incubating/apache-pulsar-1.20.0-incubating-src.tar.gz # Binary release $ wget http://archive.apache.org/dist/incubator/pulsar/pulsar-1.20.0-incubating/apache-pulsar-1.20.0-incubating-bin.tar.gz
Once the tarball is downloaded, untar it and
cd into the resulting directory:
# Source release $ tar xvfz apache-pulsar-1.20.0-incubating-src.tar.gz $ cd apache-pulsar-1.20.0-incubating # Binary release $ tar xvfz apache-pulsar-1.20.0-incubating-bin.tar.gz $ cd apache-pulsar-1.20.0-incubating
What your package contains
Both the source and binary packages contain the following directories:
||Pulsar’s command-line tools, such as
||Configuration files for Pulsar, including for broker configuration, ZooKeeper configuration, and more|
||The data storage directory used by ZooKeeper and BookKeeper.|
||The JAR files used by Pulsar.|
||Logs created by the installation.|
The source package contains all of the assets, specific to version 1.20.0-incubating, from the Pulsar repository.
Compiling from source
If you’ve downloaded a source release and would like to compile it, you’ll need to have JDK 8 and Maven installed. To run the scripts in the
bin directory, you’ll need to have the Java SE Runtime Environment installed (JDK 8 already includes this).
To compile, skipping the tests:
$ mvn install -DskipTests
Each Pulsar instance relies on two separate ZooKeeper quorums.
- Local ZooKeeper operates at the cluster level and provides cluster-specific configuration management and coordination. Each Pulsar cluster needs to have a dedicated ZooKeeper cluster.
- Global ZooKeeper operates at the instance level and provides configuration management for the entire system (and thus across clusters). The global ZooKeeper quorum can be provided by an independent cluster of machines or by the same machines used by local ZooKeeper.
Deploying local ZooKeeper
ZooKeeper manages a variety of essential coordination- and configuration-related tasks for Pulsar.
Deploying a Pulsar instance requires you to stand up one local ZooKeeper cluster per Pulsar cluster.
To begin, add all ZooKeeper servers to the quorum configuration specified in the
conf/zookeeper.conf file. Add a
server.N line for each node in the cluster to the configuration, where
N is the number of the ZooKeeper node. Here’s an example for a three-node cluster:
server.1=zk1.us-west.example.com:2888:3888 server.2=zk2.us-west.example.com:2888:3888 server.3=zk3.us-west.example.com:2888:3888
On each host, you need to specify the ID of the node in each node’s
myid file, which is in each server’s
data/zookeeper folder by default (this can be changed via the
See the Multi-server setup guide in the ZooKeeper documentation for detailed info on
myid and more.
On a ZooKeeper server at
zk1.us-west.example.com, for example, you could set the
myid value like this:
$ mkdir -p data/zookeeper $ echo 1 > data/zookeeper/myid
zk2.us-west.example.com the command would be
echo 2 > data/zookeeper/myid and so on.
Once each server has been added to the
zookeeper.conf configuration and has the appropriate
myid entry, you can start ZooKeeper on all hosts (in the background, using nohup) with the
pulsar-daemon CLI tool:
$ bin/pulsar-daemon start zookeeper
Deploying global ZooKeeper
The ZooKeeper cluster configured and started up in the section above is a local ZooKeeper cluster used to manage a single Pulsar cluster. In addition to a local cluster, however, a full Pulsar instance also requires a global ZooKeeper quorum for handling some instance-level configuration and coordination tasks.
If you’re deploying a single-cluster instance, then you will not need a separate cluster for global ZooKeeper. If, however, you’re deploying a multi-cluster instance, then you should stand up a separate ZooKeeper cluster for instance-level tasks.
Single-cluster Pulsar instance
If your Pulsar instance will consist of just one cluster, then you can deploy global ZooKeeper on the same machines as the local ZooKeeper quorum but running on different TCP ports.
To deploy global ZooKeeper in a single-cluster instance, add the same ZooKeeper servers used by the local quorom to the configuration file in
conf/global_zookeeper.conf using the same method for local ZooKeeper, but make sure to use a different port (2181 is the default for ZooKeeper). Here’s an example that uses port 2184 for a three-node ZooKeeper cluster:
clientPort=2184 server.1=zk1.us-west.example.com:2185:2186 server.2=zk2.us-west.example.com:2185:2186 server.3=zk3.us-west.example.com:2185:2186
As before, create the
myid files for each server on
Multi-cluster Pulsar instance
When deploying a global Pulsar instance, with clusters distributed across different geographical regions, the global ZooKeeper serves as a highly available and strongly consistent metadata store that can tolerate failures and partitions spanning whole regions.
The key here is to make sure the ZK quorum members are spread across at least 3 regions and that other regions are running as observers.
Again, given the very low expected load on the global ZooKeeper servers, we can share the same hosts used for the local ZooKeeper quorum.
For example, let’s assume a Pulsar instance with the following clusters
ap-south. Also let’s assume, each cluster
will have its own local ZK servers named such as
In this scenario we want to pick the quorum participants from few clusters and
let all the others be ZK observers. For example, to form a 7 servers quorum, we
can pick 3 servers from
us-west, 2 from
us-central and 2 from
This will guarantee that writes to global ZooKeeper will be possible even if one of these regions is unreachable.
The ZK configuration in all the servers will look like:
clientPort=2184 server.1=zk1.us-west.example.com:2185:2186 server.2=zk2.us-west.example.com:2185:2186 server.3=zk3.us-west.example.com:2185:2186 server.4=zk1.us-central.example.com:2185:2186 server.5=zk2.us-central.example.com:2185:2186 server.6=zk3.us-central.example.com:2185:2186:observer server.7=zk1.us-east.example.com:2185:2186 server.8=zk2.us-east.example.com:2185:2186 server.9=zk3.us-east.example.com:2185:2186:observer server.10=zk1.eu-central.example.com:2185:2186:observer server.11=zk2.eu-central.example.com:2185:2186:observer server.12=zk3.eu-central.example.com:2185:2186:observer server.13=zk1.ap-south.example.com:2185:2186:observer server.14=zk2.ap-south.example.com:2185:2186:observer server.15=zk3.ap-south.example.com:2185:2186:observer
Additionally, ZK observers will need to have:
Starting the service
Once your global ZooKeeper configuration is in place, you can start up the service using
$ bin/pulsar-daemon start global-zookeeper
Cluster metadata initialization
Once you’ve set up local and global ZooKeeper for your instance, there is some metadata that needs to be written to ZooKeeper for each cluster in your instance. It only needs to be written once.
$ bin/pulsar initialize-cluster-metadata \ --cluster us-west \ --zookeeper zk1.us-west.example.com:2181 \ --global-zookeeper zk1.us-west.example.com:2184 \ --web-service-url http://pulsar.us-west.example.com:8080/ \ --web-service-url-tls https://pulsar.us-west.example.com:8443/ \ --broker-service-url pulsar://pulsar.us-west.example.com:6650/ \ --broker-service-url-tls pulsar+ssl://pulsar.us-west.example.com:6651/
As you can see from the example above, the following needs to be specified:
- The name of the cluster
- The local ZooKeeper connection string for the cluster
- The global ZooKeeper connection string for the entire instance
- The web service URL for the cluster
- A broker service URL enabling interaction with the brokers in the cluster
In each Pulsar instance, there is a
global cluster that you can administer just like other clusters. The
global cluster enables you to do things like create global topics.
If you’re using TLS, you’ll also need to specify a TLS web service URL for the cluster as well as a TLS broker service URL for the brokers in the cluster.
Make sure to run
initialize-cluster-metadata for each cluster in your instance.
BookKeeper provides persistent message storage for Pulsar.
Each Pulsar broker needs to have its own cluster of bookies. The BookKeeper cluster shares a local ZooKeeper quorum with the Pulsar cluster.
BookKeeper bookies can be configured using the
conf/bookkeeper.conf configuration file. The most important aspect of configuring each bookie is ensuring that the
zkServers parameter is set to the connection string for the Pulsar cluster’s local ZooKeeper.
Starting up bookies
You can start up a bookie in two ways: in the foreground or as a background daemon.
To start up a bookie in the foreground, use the
$ bin/pulsar-daemon start bookie
You can verify that the bookie is working properly using the
bookiesanity command for the BookKeeper shell:
$ bin/bookkeeper shell bookiesanity
This will create a new ledger on the local bookie, write a few entries, read them back and finally delete the ledger.
Bookie hosts are responsible for storing message data on disk. In order for bookies to provide optimal performance, it’s essential that they have a suitable hardware configuration. There are two key dimensions to bookie hardware capacity:
- Disk I/O capacity read/write
- Storage capacity
Message entries written to bookies are always synced to disk before returning an acknowledgement to the Pulsar broker. To ensure low write latency, BookKeeper is designed to use multiple devices:
- A journal to ensure durability. For sequential writes, it’s critical to have fast fsync operations on bookie hosts. Typically, small and fast solid-state drives (SSDs) should suffice, or hard disk drives (HDDs) with a RAIDs controller and a battery-backed write cache. Both solutions can reach fsync latency of ~0.4 ms.
- A ledger storage device is where data is stored until all consumers have acknowledged the message. Writes will happen in the background, so write I/O is not a big concern. Reads will happen sequentially most of the time and the backlog is drained only in case of consumer drain. To store large amounts of data, a typical configuration will involve multiple HDDs with a RAID controller.
Once you’ve set up ZooKeeper, initialized cluster metadata, and spun up BookKeeper bookies, you can deploy brokers.
Brokers can be configured using the
conf/broker.conf configuration file.
The most important element of broker configuration is ensuring that each broker is aware of its local ZooKeeper quorum as well as the global ZooKeeper quorum. Make sure that you set the
zookeeperServers parameter to reflect the local quorum and the
globalZookeeperServers parameter to reflect the global quorum (although you’ll need to specify only those global ZooKeeper servers located in the same cluster).
You also need to specify the name of the cluster to which the broker belongs using the
Here’s an example configuration:
# Local ZooKeeper servers zookeeperServers=zk1.us-west.example.com:2181,zk2.us-west.example.com:2181,zk3.us-west.example.com:2181 # Global Zookeeper quorum connection string. globalZookeeperServers=zk1.us-west.example.com:2184,zk2.us-west.example.com:2184,zk3.us-west.example.com:2184 clusterName=us-west
Pulsar brokers do not require any special hardware since they don’t use the local disk. Fast CPUs and 10Gbps NIC are recommended since the software can take full advantage of that.
Starting the broker service
$ bin/pulsar-daemon start broker
You can also start brokers in the foreground using
$ bin/pulsar broker
Clients connecting to Pulsar brokers need to be able to communicate with an entire Pulsar instance using a single URL. Pulsar provides a built-in service discovery mechanism that you can set up using the instructions immediately below.
You can also use your own service discovery system if you’d like. If you use your own system, there is just one requirement: when a client performs an HTTP request to an endpoint for a Pulsar cluster, such as
http://pulsar.us-west.example.com:8080, the client needs to be redirected to some active broker in the desired cluster, whether via DNS, an HTTP or IP redirect, or some other means.
Service discovery already provided by many scheduling systems
Many large-scale deployment systems, such as Kubernetes, have service discovery systems built in. If you’re running Pulsar on such a system, you may not need to provide your own service discovery mechanism.
Service discovery setup
The service discovery mechanism included with Pulsar maintains a list of active brokers, stored in ZooKeeper, and supports lookup using HTTP and also Pulsar’s binary protocol.
To get started setting up Pulsar’s built-in service discovery, you need to change a few parameters in the
conf/discovery.conf configuration file. Set the
zookeeperServers parameter to the global ZooKeeper quorum connection string and the
# Zookeeper quorum connection string zookeeperServers=zk1.us-west.example.com:2181,zk2.us-west.example.com:2181,zk3.us-west.example.com:2181 # Global zookeeper quorum connection string globalZookeeperServers=zk1.us-west.example.com:2184,zk2.us-west.example.com:2184,zk3.us-west.example.com:2184
To start the discovery service:
$ bin/pulsar-daemon start discovery
Admin client and verification
At this point your Pulsar instance should be ready to use. You can now configure client machines that can serve as administrative clients for each cluster. You can use the
conf/client.conf configuration file to configure admin clients.
The most important thing is that you point the
serviceUrl parameter to the correct service URL for the cluster:
Provisioning new tenants
Pulsar was built as a fundamentally multi-tenant system. New tenants can be provisioned as Pulsar properties. Properties can be
To allow a new tenant to use the system, we need to create a new property. You can create a new property using the
pulsar-admin CLI tool:
$ bin/pulsar-admin properties create test-prop \ --allowed-clusters us-west \ --admin-roles test-admin-role
This will allow users who identify with role
test-admin-role to administer the configuration for the property
test which will only be allowed to use the cluster
us-west. From now on, this tenant will be able to self-manage its resources.
Once a tenant has been created, you will need to create namespaces for topics within that property.
The first step is to create a namespace. A namespace is an administrative unit that can contain many topic. Common practice is to create a namespace for each different use case from a single tenant.
$ bin/pulsar-admin namespaces create test/us-west/ns1
Testing producer and consumer
Everything is now ready to send and receive messages. The quickest way to test
the system is through the
pulsar-perf client tool.
Let’s use a topic in the namespace we just created. Topics are automatically created the first time a producer or a consumer tries to use them.
The topic name in this case could be:
Start a consumer that will create a subscription on the topic and will wait for messages:
$ bin/pulsar-perf consume persistent://test/us-west/ns1/my-topic
Start a producer that publishes messages at a fixed rate and report stats every 10 seconds:
$ bin/pulsar-perf produce persistent://test/us-west/ns1/my-topic
To report the topic stats:
$ bin/pulsar-admin persistent stats persistent://test/us-west/ns1/my-topic