Skip to main content

Hdfs Connector

Sink

The Hdfs Sink Connector is used to pull messages from Pulsar topics and persist the messages to an hdfs file.

Sink Configuration Options

NameDefaultRequiredDescription
hdfsConfigResourcesnulltrueA file or comma separated list of files which contains the Hadoop file system configuration, e.g. 'core-site.xml', 'hdfs-site.xml'.
directorynulltrueThe HDFS directory from which files should be read from or written to.
encodingnullfalseThe character encoding for the files, e.g. UTF-8, ASCII, etc.
compressionnullfalseThe compression codec used to compress/de-compress the files on HDFS.
kerberosUserPrincipalnullfalseThe Kerberos user principal account to use for authentication.
keytabnullfalseThe full pathname to the Kerberos keytab file to use for authentication.
filenamePrefixnullfalseThe prefix of the files to create inside the HDFS directory, i.e. a value of "topicA" will result in files named topicA-, topicA-, etc being produced.
fileExtensionnullfalseThe extension to add to the files written to HDFS, e.g. '.txt', '.seq', etc.
separatornullfalseThe character to use to separate records in a text file. If no value is provided then the content from all of the records will be concatenated together in one continuous byte array.
syncIntervalnullfalseThe interval (in milliseconds) between calls to flush data to HDFS disk.
maxPendingRecordsInteger.MAX_VALUEfalseThe maximum number of records that we hold in memory before acking. Default is Integer.MAX_VALUE. Setting this value to one, results in every record being sent to disk before the record is acked, while setting it to a higher values allows us to buffer records before flushing them all to disk.