S3 offloader is introduced to serve S3-compatible storage, which means that the storage employs the S3 API as its "language" and applications that speak the S3 API are able to plug and play with S3-compatible storage.
This chapter guides you through every step of installing and configuring the S3 offloader and using it with Pulsar.
Download and untar the Pulsar offloaders package, then copy the Pulsar offloaders as offloaders in the Pulsar directory. See Install tiered storage offloaders.
Before offloading data from BookKeeper to S3-compatible storage, you need to configure some properties of the S3 offload driver. Besides, you can also configure the S3 offloader to run it automatically or trigger it manually.
A bucket is a basic container that holds your data. Everything you store in S3-compatible storage must be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you cannot nest a bucket.
Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored in a Pulsar cluster. Once the topic reaches the threshold, an offloading operation is triggered automatically.
Threshold value
Action
> 0
It triggers the offloading operation if the topic storage reaches its threshold.
= 0
It causes a broker to offload data as soon as possible.
< 0
It disables automatic offloading operation.
Automatic offloading runs when a new segment is added to a topic log. If you set the threshold for a namespace, but few messages are being produced to the topic, the offloader does not work until the current segment is full.
You can configure the threshold size using CLI tools, such as pulsar-admin.
The offload configurations in broker.conf and standalone.conf are used for the namespaces that do not have namespace-level offload policies. Each namespace can have its offload policy. If you want to set an offload policy for a specific namespace, use the command pulsar-admin namespaces set-offload-policies options command.
For more information about the pulsar-admin namespaces set-offload-threshold options command, including flags, descriptions, and default values, see Pulsar admin docs.
To trigger it via CLI tools, you need to specify the maximum amount of data (threshold) that should be retained in a Pulsar cluster for a topic. If the size of the topic data in the Pulsar cluster exceeds this threshold, segments from the topic are moved to S3-compatible storage until the threshold is no longer exceeded. Older segments are moved first.
For more information about the pulsar-admin topics offload-status options command, including flags, descriptions, and default values, see Pulsar admin docs.