If you are running Pulsar in a bare metal cluster, make sure that offloaders tarball is unzipped in every broker's Pulsar directory.
If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the apachepulsar/pulsar-all image instead of the apachepulsar/pulsar image. apachepulsar/pulsar-all image has already bundled tiered storage offloaders.
You can configure the AWS S3 offloader driver in the configuration file broker.conf or standalone.conf.
Required configurations are as below.
Offloader driver name, which is case-insensitive.
Note: there is a third driver type, S3, which is identical to AWS S3, though S3 requires that you specify an endpoint URL using s3ManagedLedgerOffloadServiceEndpoint. This is useful if using an S3 compatible data store other than AWS S3.
Optional configurations are as below.
Note: before specifying a value for this parameter, you need to set the following configurations. Otherwise, you might get an error.
A bucket is a basic container that holds your data. Everything you store in AWS S3 must be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you cannot nest a bucket.
Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic reaches the threshold, an offloading operation is triggered automatically.
It triggers the offloading operation if the topic storage reaches its threshold.
It causes a broker to offload data as soon as possible.
It disables automatic offloading operation.
Automatic offloading runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, offloader does not work until the current segment is full.
You can configure the threshold size using CLI tools, such as pulsar-admin.
The offload configurations in broker.conf and standalone.conf are used for the namespaces that do not have namespace level offload policies. Each namespace can have its own offload policy. If you want to set offload policy for each namespace, use the command pulsar-admin namespaces set-offload-policies options command.
For individual topics, you can trigger AWS S3 offloader manually using one of the following methods:
Use REST endpoint.
Use CLI tools (such as pulsar-admin).
To trigger it via CLI tools, you need to specify the maximum amount of data (threshold) that should be retained on a Pulsar cluster for a topic. If the size of the topic data on the Pulsar cluster exceeds this threshold, segments from the topic are moved to AWS S3 until the threshold is no longer exceeded. Older segments are moved first.