If you are running Pulsar in a bare metal cluster, make sure that offloaders tarball is unzipped in every broker's Pulsar directory.
If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the apachepulsar/pulsar-all image instead of the apachepulsar/pulsar image. apachepulsar/pulsar-all image has already bundled tiered storage offloaders.
A bucket is a basic container that holds your data. Everything you store in Azure BlobStore must be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you cannot nest a bucket.
You can configure the size of a request sent to or read from Azure BlobStore in the configuration file broker.conf or standalone.conf.
Configuration
Description
Default value
managedLedgerOffloadReadBufferSizeInBytes
Block size for each individual read when reading back data from Azure BlobStore store.
1 MB
managedLedgerOffloadMaxBlockSizeInBytes
Maximum size of a "part" sent during a multipart upload to Azure BlobStore store. It cannot be smaller than 5 MB.
64 MB
Configure Azure BlobStore offloader to run automatically
Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic reaches the threshold, an offloading operation is triggered automatically.
Threshold value
Action
0 | It triggers the offloading operation if the topic storage reaches its threshold.
= 0|It causes a broker to offload data as soon as possible.
< 0 |It disables automatic offloading operation.
Automatic offloading runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, offloader does not work until the current segment is full.
You can configure the threshold size using CLI tools, such as pulsar-admin.
The offload configurations in broker.conf and standalone.conf are used for the namespaces that do not have namespace level offload policies. Each namespace can have its own offload policy. If you want to set offload policy for each namespace, use the command pulsar-admin namespaces set-offload-policies options command.
For more information about the pulsar-admin namespaces set-offload-threshold options command, including flags, descriptions, and default values, see here.
Configure Azure BlobStore offloader to run manually
For individual topics, you can trigger Azure BlobStore offloader manually using one of the following methods:
Use REST endpoint.
Use CLI tools (such as pulsar-admin).
To trigger it via CLI tools, you need to specify the maximum amount of data (threshold) that should be retained on a Pulsar cluster for a topic. If the size of the topic data on the Pulsar cluster exceeds this threshold, segments from the topic are moved to Azure BlobStore until the threshold is no longer exceeded. Older segments are moved first.