This chapter guides you through every step of installing and configuring the Azure BlobStore offloader and using it with Pulsar.
Follow the steps below to install the Azure BlobStore offloader.
- Pulsar: 2.6.2 or later versions
This example uses Pulsar 2.6.2.
Download the Pulsar tarball using one of the following ways:
Download and untar the Pulsar offloaders package.
wget https://downloads.apache.org/pulsar/pulsar-2.6.2/apache-pulsar-offloaders-2.6.2-bin.tar.gz tar xvfz apache-pulsar-offloaders-2.6.2-bin.tar.gz
Copy the Pulsar offloaders as
offloadersin the Pulsar directory.
mv apache-pulsar-offloaders-2.6.2/offloaders apache-pulsar-2.6.2/offloaders ls offloaders
If you are running Pulsar in a bare metal cluster, make sure that
offloaderstarball is unzipped in every broker's Pulsar directory.
If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the
apachepulsar/pulsar-allimage instead of the
apachepulsar/pulsar-allimage has already bundled tiered storage offloaders.
Before offloading data from BookKeeper to Azure BlobStore, you need to configure some properties of the Azure BlobStore offload driver.
Besides, you can also configure the Azure BlobStore offloader to run it automatically or trigger it manually.
Configure Azure BlobStore offloader driver
You can configure the Azure BlobStore offloader driver in the configuration file
Required configurations are as below.
Required configuration | Description | Example value |---|---|---
managedLedgerOffloadDriver| Offloader driver name | azureblob
offloadersDirectory| Offloader directory | offloaders
managedLedgerOffloadBucket| Bucket | pulsar-topic-offload
Optional configurations are as below.
Optional | Description | Example value |---|---|---
managedLedgerOffloadReadBufferSizeInBytes|Size of block read|1 MB
managedLedgerOffloadMaxBlockSizeInBytes|Size of block write|64 MB
managedLedgerMinLedgerRolloverTimeMinutes|Minimum time between ledger rollover for a topic
Note: it is not recommended that you set this configuration in the production environment.|2
managedLedgerMaxEntriesPerLedger|Maximum number of entries to append to a ledger before triggering a rollover.
Note: it is not recommended that you set this configuration in the production environment.|5000
A bucket is a basic container that holds your data. Everything you store in Azure BlobStore must be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you cannot nest a bucket.
This example names the bucket as pulsar-topic-offload.
To be able to access Azure BlobStore, you need to authenticate with Azure BlobStore.
Set the environment variables
"export" is important so that the variables are made available in the environment of spawned processes.
export AZURE_STORAGE_ACCOUNT=ABC123456789 export AZURE_STORAGE_ACCESS_KEY=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c
Size of block read/write
You can configure the size of a request sent to or read from Azure BlobStore in the configuration file
Configuration|Description|Default value |---|---|---
managedLedgerOffloadReadBufferSizeInBytes|Block size for each individual read when reading back data from Azure BlobStore store.|1 MB
managedLedgerOffloadMaxBlockSizeInBytes|Maximum size of a "part" sent during a multipart upload to Azure BlobStore store. It cannot be smaller than 5 MB. |64 MB
Configure Azure BlobStore offloader to run automatically
Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic reaches the threshold, an offloading operation is triggered automatically.
Threshold value|Action |---|---
0 | It triggers the offloading operation if the topic storage reaches its threshold. = 0|It causes a broker to offload data as soon as possible. < 0 |It disables automatic offloading operation.
Automatic offloading runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, offloader does not work until the current segment is full.
You can configure the threshold size using CLI tools, such as pulsar-admin.
The offload configurations in
standalone.conf are used for the namespaces that do not have namespace level offload policies. Each namespace can have its own offload policy. If you want to set offload policy for each namespace, use the command
pulsar-admin namespaces set-offload-policies options command.
This example sets the Azure BlobStore offloader threshold size to 10 MB using pulsar-admin.
bin/pulsar-admin namespaces set-offload-threshold --size 10M my-tenant/my-namespace
For more information about the
pulsar-admin namespaces set-offload-threshold optionscommand, including flags, descriptions, and default values, see here.
Configure Azure BlobStore offloader to run manually
For individual topics, you can trigger Azure BlobStore offloader manually using one of the following methods:
Use REST endpoint.
Use CLI tools (such as pulsar-admin).
To trigger it via CLI tools, you need to specify the maximum amount of data (threshold) that should be retained on a Pulsar cluster for a topic. If the size of the topic data on the Pulsar cluster exceeds this threshold, segments from the topic are moved to Azure BlobStore until the threshold is no longer exceeded. Older segments are moved first.
This example triggers the Azure BlobStore offloader to run manually using pulsar-admin.
bin/pulsar-admin topics offload --size-threshold 10M my-tenant/my-namespace/topic1
Offload triggered for persistent://my-tenant/my-namespace/topic1 for messages before 2:0:-1
For more information about the
pulsar-admin topics offload optionscommand, including flags, descriptions, and default values, see here.
This example checks the Azure BlobStore offloader status using pulsar-admin.
bin/pulsar-admin topics offload-status persistent://my-tenant/my-namespace/topic1
Offload is currently running
To wait for the Azure BlobStore offloader to complete the job, add the
bin/pulsar-admin topics offload-status -w persistent://my-tenant/my-namespace/topic1
Offload was a success
If there is an error in offloading, the error is propagated to the `pulsar-admin topics offload-status` command. ```bash bin/pulsar-admin topics offload-status persistent://my-tenant/my-namespace/topic1 ``` **Output** ``` Error in offload null Reason: Error offloading: org.apache.bookkeeper.mledger.ManagedLedgerException: ``` > #### Tip > > For more information about the `pulsar-admin topics offload-status options` command, including flags, descriptions, and default values, see [here](http://pulsar.apache.org/tools/pulsar-admin/2.6.0-SNAPSHOT/#-em-offload-status-em-).