Skip to main content

Alluxio sink connector

note

You can download all the Pulsar connectors on download page.

Sink​

The Alluxio sink connector pulls messages from Pulsar topics and persists the messages to an Alluxio directory.

Configuration​

The configuration of the Alluxio sink connector has the following properties.

Property​

NameTypeRequiredDefaultDescription
alluxioMasterHostStringtrue"" (empty string)The hostname of Alluxio master.
alluxioMasterPortinttrue19998The port that Alluxio master node runs on.
alluxioDirStringtrue"" (empty string)The Alluxio directory from which files should be read from or written to.
securityLoginUserStringfalse"" (empty string)When alluxio.security.authentication.type is set to SIMPLE or CUSTOM, user application uses this property to indicate the user requesting Alluxio service. If it is not set explicitly, the OS login user is used.
filePrefixStringfalse"" (empty string)The prefix of the files to create in the Alluxio directory (e.g. a value of 'TopicA' results in files named topicA-, topicA-, etc being produced).
fileExtensionStringfalse"" (empty string)The extension to add to the files written to Alluxio (e.g. '.txt').
lineSeparatorStringfalse"" (empty string)The character used to separate records in a text file. If no value is provided, then the content from all of the records is concatenated together in one continuous byte array.
rotationRecordslongfalse10000The number records of Alluxio file rotation.
rotationIntervallongfalse-1The interval to rotate a Alluxio file (in milliseconds).
schemaEnablebooleanfalsefalseSets whether the Sink has to take into account the Schema or if it should simply copy the raw message to Alluxio.
writeTypeStringfalseMUST_CACHEDefault write type when creating Alluxio files. Valid options are MUST_CACHE (write only goes to Alluxio and must be stored in Alluxio), CACHE_THROUGH (try to cache, write to UnderFS synchronously), THROUGH (no cache, write to UnderFS synchronously).

Example​

Before using the Alluxio sink connector, you need to create a configuration file in the path you will start Pulsar service (i.e. PULSAR_HOME) through one of the following methods.

  • JSON

    {
    "alluxioMasterHost": "localhost",
    "alluxioMasterPort": "19998",
    "alluxioDir": "pulsar",
    "filePrefix": "TopicA",
    "fileExtension": ".txt",
    "lineSeparator": "\n",
    "rotationRecords": "100",
    "rotationInterval": "-1"
    }
  • YAML

    configs:
    alluxioMasterHost: "localhost"
    alluxioMasterPort: "19998"
    alluxioDir: "pulsar"
    filePrefix: "TopicA"
    fileExtension: ".txt"
    lineSeparator: "\n"
    rotationRecords: 100
    rotationInterval: "-1"