Download and untar the Pulsar offloaders package, then copy the Pulsar offloaders as offloaders in the Pulsar directory. See Install tiered storage offloaders.
You can configure the filesystem offloader driver in the broker.conf or standalone.conf configuration file.
HDFS
NFS
Required configurations are as below.
Parameter
Description
Example value
managedLedgerOffloadDriver
Offloader driver name, which is case-insensitive.
filesystem
fileSystemURI
Connection address, which is the URI to access the default Hadoop distributed file system.
hdfs://127.0.0.1:9000
offloadersDirectory
Offloader directory
offloaders
fileSystemProfilePath
Hadoop profile path. The configuration file is stored in the Hadoop profile path. It contains various settings for Hadoop performance tuning.
conf/filesystem_offload_core_site.xml
Optional configurations are as below.
Parameter
Description
Example value
managedLedgerMinLedgerRolloverTimeMinutes
Minimum time between ledger rollover for a topic.
Note: it is not recommended to set this parameter in the production environment.
10
managedLedgerMaxEntriesPerLedger
Maximum number of entries to append to a ledger before triggering a rollover.
Note: it is not recommended to set this parameter in the production environment.
50000
Required configurations are as below.
Parameter | Description | Example value
|---|---|---
managedLedgerOffloadDriver | Offloader driver name, which is case-insensitive. | filesystem
offloadersDirectory | Offloader directory | offloaders
fileSystemProfilePath | NFS profile path. The configuration file is stored in the NFS profile path. It contains various settings for performance tuning. | conf/filesystem_offload_core_site.xml
Optional configurations are as below.
Parameter
Description
Example value
managedLedgerMinLedgerRolloverTimeMinutes
Minimum time between ledger rollover for a topic.
Note: it is not recommended to set this parameter in the production environment.
10
managedLedgerMaxEntriesPerLedger
Maximum number of entries to append to a ledger before triggering a rollover.
Note: it is not recommended to set this parameter in the production environment.
You can configure the namespace policy to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic storage reaches the threshold, an offload operation is triggered automatically.
Threshold value
Action
> 0
It triggers the offloading operation if the topic storage reaches its threshold.
= 0
It causes a broker to offload data as soon as possible.
< 0
It disables automatic offloading operation.
Automatic offload runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, the filesystem offloader does not work until the current segment is full.
You can configure the threshold using CLI tools, such as pulsar-admin.
For more information about the pulsar-admin namespaces set-offload-threshold options command, including flags, descriptions, default values, and shorthands, see Pulsar admin docs.
For individual topics, you can trigger the filesystem offloader manually using one of the following methods:
Use the REST endpoint.
Use CLI tools (such as pulsar-admin).
To manually trigger the filesystem offloader via CLI tools, you need to specify the maximum amount of data (threshold) that should be retained on a Pulsar cluster for a topic. If the size of the topic data on the Pulsar cluster exceeds this threshold, segments from the topic are offloaded to the filesystem until the threshold is no longer exceeded. Older segments are offloaded first.
Offload triggered for persistent://my-tenant/my-namespace/topic1 for messages before 2:0:-1
tip
For more information about the pulsar-admin topics offload options command, including flags, descriptions, default values, and shorthands, see Pulsar admin docs.
This example checks filesystem offloader status using pulsar-admin.
For more information about the pulsar-admin topics offload-status options command, including flags, descriptions, default values, and shorthands, see Pulsar admin docs.
This section provides step-by-step instructions on how to use the filesystem offloader to move data from Pulsar to Hadoop Distributed File System (HDFS) or Network File system (NFS).
# Now check that you can ssh to the localhost without a passphrase: ssh localhost # If you cannot ssh to localhost without a passphrase, execute the following commands ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys
Start HDFS.
# don't execute this command repeatedly, repeat execute will cauld the clusterId of the datanode is not consistent with namenode $HADOOP_HOME/bin/hadoop namenode -format $HADOOP_HOME/sbin/start-dfs.sh
As indicated in the configuration section, you need to configure some properties for the filesystem offloader driver before using it. This tutorial assumes that you have configured the filesystem offloader driver as below and run Pulsar in standalone mode.
Set the following configurations in the conf/standalone.conf file.
For testing purposes, you can set the following two configurations to speed up ledger rollover, but it is not recommended that you set them in the production environment.
Step 4: Offload data from BookKeeper to filesystem
Execute the following commands in the repository where you download Pulsar tarball. For example, ~/path/to/apache-pulsar-2.5.1.
Start Pulsar standalone.
bin/pulsar standalone -a 127.0.0.1
To ensure the data generated is not deleted immediately, it is recommended to set the retention policy, which can be either a size limit or a time limit. The larger value you set for the retention policy, the longer the data can be retained.
bin/pulsar-admin namespaces set-retention public/default --size 100M --time 2d
tip
For more information about the pulsarctl namespaces set-retention options command, including flags, descriptions, default values, and shorthands, see here.
Produce data using pulsar-client.
bin/pulsar-client produce -m "Hello FileSystem Offloader" -n 1000 public/default/fs-test
The offloading operation starts after a ledger rollover is triggered. To ensure offload data successfully, it is recommended that you wait until several ledger rollovers are triggered. In this case, you might need to wait for a second. You can check the ledger status using pulsarctl.
In this section, it is assumed that you have enabled NFS service and set the shared path of your NFS service. In this section, /Users/test is used as the shared path of NFS service.
This example mounts /Users/pulsar_nfs to /Users/test.
mount -e 192.168.0.103:/Users/test/Users/pulsar_nfs
Step 3: Configure the filesystem offloader driver
As indicated in the configuration section, you need to configure some properties for the filesystem offloader driver before using it. This tutorial assumes that you have configured the filesystem offloader driver as below and run Pulsar in standalone mode.
Set the following configurations in the conf/standalone.conf file.