wget https://downloads.apache.org/pulsar/pulsar-2.5.1/apache-pulsar-offloaders-2.5.1-bin.tar.gz tar xvfz apache-pulsar-offloaders-2.5.1-bin.tar.gz
note
If you are running Pulsar in a bare metal cluster, make sure that offloaders tarball is unzipped in every broker's Pulsar directory.
If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8S and DCOS), you can use the apachepulsar/pulsar-all image instead of the apachepulsar/pulsar image. apachepulsar/pulsar-all image has already bundled tiered storage offloaders.
Copy the Pulsar offloaders as offloaders in the Pulsar directory.
mv apache-pulsar-offloaders-2.5.1/offloaders apache-pulsar-2.5.1/offloaders ls offloaders
If you are running Pulsar in a bare metal cluster, make sure that offloaders tarball is unzipped in every broker's Pulsar directory.
If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the apachepulsar/pulsar-all image instead of the apachepulsar/pulsar image. apachepulsar/pulsar-all image has already bundled tiered storage offloaders.
For more information about the Hadoop HDFS, see here.
Configure filesystem offloader to run automatically
Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic reaches the threshold, an offload operation is triggered automatically.
Threshold value
Action
0 | It triggers the offloading operation if the topic storage reaches its threshold.
= 0|It causes a broker to offload data as soon as possible.
< 0 |It disables automatic offloading operation.
Automatic offload runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, offloader does not work until the current segment is full.
You can configure the threshold size using CLI tools, such as pulsar-admin.
For more information about the pulsar-admin namespaces set-offload-threshold options command, including flags, descriptions, default values, and shorthands, see here.
For individual topics, you can trigger filesystem offloader manually using one of the following methods:
Use REST endpoint.
Use CLI tools (such as pulsar-admin).
To trigger via CLI tools, you need to specify the maximum amount of data (threshold) that should be retained on a Pulsar cluster for a topic. If the size of the topic data on the Pulsar cluster exceeds this threshold, segments from the topic are offloaded to the filesystem until the threshold is no longer exceeded. Older segments are offloaded first.
For more information about the pulsar-admin topics offload-status options command, including flags, descriptions, default values, and shorthands, see here.