Skip to main content

File Connector

Source​

The File Source Connector is used to pull messages from files in a directory and persist the messages to a Pulsar topic.

Source Configuration Options​

NameRequiredDefaultDescription
inputDirectorytruenullThe input directory from which to pull files.
recursefalsetrueIndicates whether or not to pull files from sub-directories.
keepFilefalsefalseIf true, the file is not deleted after it has been processed and causes the file to be picked up continually.
fileFilterfalse[^\\.].*Only files whose names match the given regular expression will be picked up.
pathFilterfalsenullWhen 'recurse' property is true, then only sub-directories whose path matches the given regular expression will be scanned.
minimumFileAgefalse0The minimum age that a file must be in order to be processed; any file younger than this amount of time (according to last modification date) will be ignored.
maximumFileAgefalseLong.MAX_VALUEThe maximum age that a file must be in order to be processed; any file older than this amount of time (according to last modification date) will be ignored.
minimumSizefalse1The minimum size (in bytes) that a file must be in order to be processed.
maximumSizefalseDouble.MAX_VALUEThe maximum size (in bytes) that a file can be in order to be processed.
ignoreHiddenFilesfalsetrueIndicates whether or not hidden files should be ignored or not.
pollingIntervalfalse10000Indicates how long to wait before performing a directory listing.
numWorkersfalse1The number of worker threads that will be processing the files. This allows you to process a larger number of files concurrently. However, setting this to a value greater than 1 will result in the data from multiple files being "intermingled" in the target topic.