Skip to main content

Pulsar Metrics

Pulsar exposes metrics in Prometheus format that can be collected and used for monitoring the health of the cluster.

Overview

The metrics exposed by Pulsar are in Prometheus format. The types of metrics are:

  • Counter: a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.
  • Gauge: a gauge is a metric that represents a single numerical value that can arbitrarily go up and down.
  • Histogram: a histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. The _bucket suffix is the number of observations within a histogram bucket, configured with parameter {le="<upper inclusive bound>"}. The _count suffix is the number of observations, shown as a time series and behaves like a counter. The _sum suffix is the sum of observed values, also shown as a time series and behaves like a counter. These suffixes are together denoted by _* in this doc.
  • Summary: similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.

ZooKeeper

The ZooKeeper metrics are exposed under "/metrics" at port 8000. You can use a different port by configuring the stats_server_port system property.

Server metrics

NameTypeDescription
zookeeper_server_znode_countGaugeThe number of z-nodes stored.
zookeeper_server_data_size_bytesGaugeThe total size of all of z-nodes stored.
zookeeper_server_connectionsGaugeThe number of currently opened connections.
zookeeper_server_watches_countGaugeThe number of watchers registered.
zookeeper_server_ephemerals_countGaugeThe number of ephemeral z-nodes.

Request metrics

NameTypeDescription
zookeeper_server_requestsCounterThe total number of requests received by a particular server.
zookeeper_server_requests_latency_msSummaryThe requests latency calculated in milliseconds.
Available labels: type (write, read).
  • write: the requests that write data to ZooKeeper.
  • read: the requests that read data from ZooKeeper.

BookKeeper

The BookKeeper metrics are exposed under "/metrics" at port 8000. You can change the port by updating prometheusStatsHttpPort in bookkeeper.conf configuration file.

Server metrics

NameTypeDescription
bookie_SERVER_STATUSGaugeThe server status for bookie server.
  • 1: the bookie is running in writable mode.
  • 0: the bookie is running in readonly mode.
bookkeeper_server_ADD_ENTRY_countCounterThe total number of ADD_ENTRY requests received at the bookie. The success label is used to distinguish successes and failures.
bookkeeper_server_READ_ENTRY_countCounterThe total number of READ_ENTRY requests received at the bookie. The success label is used to distinguish successes and failures.
bookie_WRITE_BYTESCounterThe total number of bytes written to the bookie.
bookie_READ_BYTESCounterThe total number of bytes read from the bookie.
bookkeeper_server_ADD_ENTRY_REQUESTSummaryThe summary of request latency of ADD_ENTRY requests at the bookie. The success label is used to distinguish successes and failures.
bookkeeper_server_READ_ENTRY_REQUESTSummaryThe summary of request latency of READ_ENTRY requests at the bookie. The success label is used to distinguish successes and failures.

Journal metrics

NameTypeDescription
bookie_journal_JOURNAL_SYNC_countCounterThe total number of journal fsync operations happening at the bookie. The success label is used to distinguish successes and failures.
bookie_journal_JOURNAL_QUEUE_SIZEGaugeThe total number of requests pending in the journal queue.
bookie_journal_JOURNAL_FORCE_WRITE_QUEUE_SIZEGaugeThe total number of force write (fsync) requests pending in the force-write queue.
bookie_journal_JOURNAL_CB_QUEUE_SIZEGaugeThe total number of callbacks pending in the callback queue.
bookie_journal_JOURNAL_ADD_ENTRYSummaryThe summary of request latency of adding entries to the journal.
bookie_journal_JOURNAL_SYNCSummaryThe summary of fsync latency of syncing data to the journal disk.

Storage metrics

NameTypeDescription
bookie_ledgers_countGaugeThe total number of ledgers stored in the bookie.
bookie_entries_countGaugeThe total number of entries stored in the bookie.
bookie_write_cache_sizeGaugeThe bookie write cache size (in bytes).
bookie_read_cache_sizeGaugeThe bookie read cache size (in bytes).
bookie_DELETED_LEDGER_COUNTCounterThe total number of ledgers deleted since the bookie has started.
bookie_ledger_writable_dirsGaugeThe number of writable directories in the bookie.

Broker

The broker metrics are exposed under "/metrics" at port 8080. You can change the port by updating webServicePort to a different port in broker.conf configuration file.

All the metrics exposed by a broker are labelled with cluster=${pulsar_cluster}. The value of ${pulsar_cluster} is the pulsar cluster name you configured in broker.conf.

Broker has the following kinds of metrics:

Namespace metrics

Namespace metrics are only exposed when exposeTopicLevelMetricsInPrometheus is set to false.

All the namespace metrics are labelled with the following labels:

  • cluster: cluster=${pulsar_cluster}. ${pulsar_cluster} is the cluster name that you configured in broker.conf.
  • namespace: namespace=${pulsar_namespace}. ${pulsar_namespace} is the namespace name.
NameTypeDescription
pulsar_topics_countGaugeThe number of Pulsar topics of the namespace owned by this broker.
pulsar_subscriptions_countGaugeThe number of Pulsar subscriptions of the namespace served by this broker.
pulsar_producers_countGaugeThe number of active producers of the namespace connected to this broker.
pulsar_consumers_countGaugeThe number of active consumers of the namespace connected to this broker.
pulsar_rate_inGaugeThe total message rate of the namespace coming into this broker (messages/second).
pulsar_rate_outGaugeThe total message rate of the namespace going out from this broker (messages/second).
pulsar_throughput_inGaugeThe total throughput of the namespace coming into this broker (bytes/second).
pulsar_throughput_outGaugeThe total throughput of the namespace going out from this broker (bytes/second).
pulsar_storage_sizeGaugeThe total storage size of the topics in this namespace owned by this broker (bytes).
pulsar_storage_backlog_sizeGaugeThe total backlog size of the topics of this namespace owned by this broker (messages).
pulsar_storage_offloaded_sizeGaugeThe total amount of the data in this namespace offloaded to the tiered storage (bytes).
pulsar_storage_write_rateGaugeThe total message batches (entries) written to the storage for this namespace (message batches / second).
pulsar_storage_read_rateGaugeThe total message batches (entries) read from the storage for this namespace (message batches / second).
pulsar_subscription_delayedGaugeThe total message batches (entries) are delayed for dispatching.
pulsarstorage_write_latency_le*HistogramThe entry rate of a namespace that the storage write latency is smaller with a given threshold.
Available thresholds:
  • pulsar_storage_write_latency_le_0_5: <= 0.5ms
  • pulsar_storage_write_latency_le_1: <= 1ms
  • pulsar_storage_write_latency_le_5: <= 5ms
  • pulsar_storage_write_latency_le_10: <= 10ms
  • pulsar_storage_write_latency_le_20: <= 20ms
  • pulsar_storage_write_latency_le_50: <= 50ms
  • pulsar_storage_write_latency_le_100: <= 100ms
  • pulsar_storage_write_latency_le_200: <= 200ms
  • pulsar_storage_write_latency_le_1000: <= 1s
  • pulsar_storage_write_latency_le_overflow: > 1s
pulsarentry_size_le*HistogramThe entry rate of a namespace that the entry size is smaller with a given threshold.
Available thresholds:
  • pulsar_entry_size_le_128: <= 128 bytes
  • pulsar_entry_size_le_512: <= 512 bytes
  • pulsar_entry_size_le_1_kb: <= 1 KB
  • pulsar_entry_size_le_2_kb: <= 2 KB
  • pulsar_entry_size_le_4_kb: <= 4 KB
  • pulsar_entry_size_le_16_kb: <= 16 KB
  • pulsar_entry_size_le_100_kb: <= 100 KB
  • pulsar_entry_size_le_1_mb: <= 1 MB
  • pulsar_entry_size_le_overflow: > 1 MB

Replication metrics

If a namespace is configured to be replicated between multiple Pulsar clusters, the corresponding replication metrics will also be exposed when replicationMetricsEnabled is enabled.

All the replication metrics will also be labelled with remoteCluster=${pulsar_remote_cluster}.

NameTypeDescription
pulsar_replication_rate_inGaugeThe total message rate of the namespace replicating from remote cluster (messages/second).
pulsar_replication_rate_outGaugeThe total message rate of the namespace replicating to remote cluster (messages/second).
pulsar_replication_throughput_inGaugeThe total throughput of the namespace replicating from remote cluster (bytes/second).
pulsar_replication_throughput_outGaugeThe total throughput of the namespace replicating to remote cluster (bytes/second).
pulsar_replication_backlogGaugeThe total backlog of the namespace replicating to remote cluster (messages).

Topic metrics

Topic metrics are only exposed when exposeTopicLevelMetricsInPrometheus is set to true.

All the topic metrics are labelled with the following labels:

  • cluster: cluster=${pulsar_cluster}. ${pulsar_cluster} is the cluster name that you configured in broker.conf.
  • namespace: namespace=${pulsar_namespace}. ${pulsar_namespace} is the namespace name.
  • topic: topic=${pulsar_topic}. ${pulsar_topic} is the topic name.
NameTypeDescription
pulsar_subscriptions_countGaugeThe number of Pulsar subscriptions of the topic served by this broker.
pulsar_producers_countGaugeThe number of active producers of the topic connected to this broker.
pulsar_consumers_countGaugeThe number of active consumers of the topic connected to this broker.
pulsar_rate_inGaugeThe total message rate of the topic coming into this broker (messages/second).
pulsar_rate_outGaugeThe total message rate of the topic going out from this broker (messages/second).
pulsar_throughput_inGaugeThe total throughput of the topic coming into this broker (bytes/second).
pulsar_throughput_outGaugeThe total throughput of the topic going out from this broker (bytes/second).
pulsar_storage_sizeGaugeThe total storage size of the topics in this topic owned by this broker (bytes).
pulsar_storage_backlog_sizeGaugeThe total backlog size of the topics of this topic owned by this broker (messages).
pulsar_storage_offloaded_sizeGaugeThe total amount of the data in this topic offloaded to the tiered storage (bytes).
pulsar_storage_backlog_quota_limitGaugeThe total amount of the data in this topic that limit the backlog quota (bytes).
pulsar_storage_write_rateGaugeThe total message batches (entries) written to the storage for this topic (message batches / second).
pulsar_storage_read_rateGaugeThe total message batches (entries) read from the storage for this topic (message batches / second).
pulsar_subscription_delayedGaugeThe total message batches (entries) are delayed for dispatching.
pulsarstorage_write_latency_le*HistogramThe entry rate of a topic that the storage write latency is smaller with a given threshold.
Available thresholds:
  • pulsar_storage_write_latency_le_0_5: <= 0.5ms
  • pulsar_storage_write_latency_le_1: <= 1ms
  • pulsar_storage_write_latency_le_5: <= 5ms
  • pulsar_storage_write_latency_le_10: <= 10ms
  • pulsar_storage_write_latency_le_20: <= 20ms
  • pulsar_storage_write_latency_le_50: <= 50ms
  • pulsar_storage_write_latency_le_100: <= 100ms
  • pulsar_storage_write_latency_le_200: <= 200ms
  • pulsar_storage_write_latency_le_1000: <= 1s
  • pulsar_storage_write_latency_le_overflow: > 1s
pulsarentry_size_le*HistogramThe entry rate of a topic that the entry size is smaller with a given threshold.
Available thresholds:
  • pulsar_entry_size_le_128: <= 128 bytes
  • pulsar_entry_size_le_512: <= 512 bytes
  • pulsar_entry_size_le_1_kb: <= 1 KB
  • pulsar_entry_size_le_2_kb: <= 2 KB
  • pulsar_entry_size_le_4_kb: <= 4 KB
  • pulsar_entry_size_le_16_kb: <= 16 KB
  • pulsar_entry_size_le_100_kb: <= 100 KB
  • pulsar_entry_size_le_1_mb: <= 1 MB
  • pulsar_entry_size_le_overflow: > 1 MB
pulsar_in_bytes_totalCounterThe total number of bytes received for this topic
pulsar_in_messages_totalCounterThe total number of messages received for this topic

Replication metrics

If a namespace that a topic belongs to is configured to be replicated between multiple Pulsar clusters, the corresponding replication metrics will also be exposed when replicationMetricsEnabled is enabled.

All the replication metrics will also be labelled with remoteCluster=${pulsar_remote_cluster}.

NameTypeDescription
pulsar_replication_rate_inGaugeThe total message rate of the topic replicating from remote cluster (messages/second).
pulsar_replication_rate_outGaugeThe total message rate of the topic replicating to remote cluster (messages/second).
pulsar_replication_throughput_inGaugeThe total throughput of the topic replicating from remote cluster (bytes/second).
pulsar_replication_throughput_outGaugeThe total throughput of the topic replicating to remote cluster (bytes/second).
pulsar_replication_backlogGaugeThe total backlog of the topic replicating to remote cluster (messages).

Subscription metrics

Subscription metrics are only exposed when exposeTopicLevelMetricsInPrometheus is set to true.

All the subscription metrics are labelled with the following labels:

  • cluster: cluster=${pulsar_cluster}. ${pulsar_cluster} is the cluster name that you configured in broker.conf.
  • namespace: namespace=${pulsar_namespace}. ${pulsar_namespace} is the namespace name.
  • topic: topic=${pulsar_topic}. ${pulsar_topic} is the topic name.
  • subscription: subscription=${subscription}. ${subscription} is the topic subscription name.
NameTypeDescription
pulsar_subscription_back_logGaugeThe total backlog of a subscription (messages).
pulsar_subscription_delayedGaugeThe total number of messages are delayed to be dispatched for a subscription (messages).
pulsar_subscription_msg_rate_redeliverGaugeThe total message rate for message being redelivered (messages/second).
pulsar_subscription_unacked_messagesGaugeThe total number of unacknowledged messages of a subscription (messages).
pulsar_subscription_blocked_on_unacked_messagesGaugeIndicate whether a subscription is blocked on unacknowledged messages or not.
  • 1 means the subscription is blocked on waiting unacknowledged messages to be acked.
  • 0 means the subscription is not blocked on waiting unacknowledged messages to be acked.
pulsar_subscription_msg_rate_outGaugeThe total message dispatch rate for a subscription (messages/second).
pulsar_subscription_msg_throughput_outGaugeThe total message dispatch throughput for a subscription (bytes/second).

Consumer metrics

Consumer metrics are only exposed when both exposeTopicLevelMetricsInPrometheus and exposeConsumerLevelMetricsInPrometheus are set to true.

All the consumer metrics are labelled with the following labels:

  • cluster: cluster=${pulsar_cluster}. ${pulsar_cluster} is the cluster name that you configured in broker.conf.
  • namespace: namespace=${pulsar_namespace}. ${pulsar_namespace} is the namespace name.
  • topic: topic=${pulsar_topic}. ${pulsar_topic} is the topic name.
  • subscription: subscription=${subscription}. ${subscription} is the topic subscription name.
  • consumer_name: consumer_name=${consumer_name}. ${consumer_name} is the topic consumer name.
  • consumer_id: consumer_id=${consumer_id}. ${consumer_id} is the topic consumer id.
NameTypeDescription
pulsar_consumer_msg_rate_redeliverGaugeThe total message rate for message being redelivered (messages/second).
pulsar_consumer_unacked_messagesGaugeThe total number of unacknowledged messages of a consumer (messages).
pulsar_consumer_blocked_on_unacked_messagesGaugeIndicate whether a consumer is blocked on unacknowledged messages or not.
  • 1 means the consumer is blocked on waiting unacknowledged messages to be acked.
  • 0 means the consumer is not blocked on waiting unacknowledged messages to be acked.
pulsar_consumer_msg_rate_outGaugeThe total message dispatch rate for a consumer (messages/second).
pulsar_consumer_msg_throughput_outGaugeThe total message dispatch throughput for a consumer (bytes/second).
pulsar_consumer_available_permitsGaugeThe available permits for for a consumer.

Jetty metrics

For a functions-worker running separately from brokers, its Jetty metrics are only exposed when includeStandardPrometheusMetrics is set to true.

All the jetty metrics are labelled with the following labels:

  • cluster: cluster=${pulsar_cluster}. ${pulsar_cluster} is the cluster name that you have configured in the broker.conf file.
NameTypeDescription
jetty_requests_totalCounterNumber of requests.
jetty_requests_activeGaugeNumber of requests currently active.
jetty_requests_active_maxGaugeMaximum number of requests that have been active at once.
jetty_request_time_max_secondsGaugeMaximum time spent handling requests.
jetty_request_time_seconds_totalCounterTotal time spent in all request handling.
jetty_dispatched_totalCounterNumber of dispatches.
jetty_dispatched_activeGaugeNumber of dispatches currently active.
jetty_dispatched_active_maxGaugeMaximum number of active dispatches being handled.
jetty_dispatched_time_maxGaugeMaximum time spent in dispatch handling.
jetty_dispatched_time_seconds_totalCounterTotal time spent in dispatch handling.
jetty_async_requests_totalCounterTotal number of async requests.
jetty_async_requests_waitingGaugeCurrently waiting async requests.
jetty_async_requests_waiting_maxGaugeMaximum number of waiting async requests.
jetty_async_dispatches_totalCounterNumber of requested that have been asynchronously dispatched.
jetty_expires_totalCounterNumber of async requests requests that have expired.
jetty_responses_totalCounterNumber of responses, labeled by status code. The code label can be "1xx", "2xx", "3xx", "4xx", or "5xx".
jetty_stats_secondsGaugeTime in seconds stats have been collected for.
jetty_responses_bytes_totalCounterTotal number of bytes across all responses.

Monitor

You can set up a Prometheus instance to collect all the metrics exposed at Pulsar components and set up Grafana dashboards to display the metrics and monitor your Pulsar cluster.

The following are some Grafana dashboards examples:

  • pulsar-grafana: A grafana dashboard that displays metrics collected in Prometheus for Pulsar clusters running on Kubernetes.
  • apache-pulsar-grafana-dashboard: A collection of grafana dashboard templates for different Pulsar components running on both Kubernetes and on-premise machines.