This guide describes how to develop Pulsar connectors to move data between Pulsar and other systems.
Pulsar connectors are special Pulsar Functions, so creating a Pulsar connector is similar to creating a Pulsar function.
Pulsar connectors come in two types:
|Import data from another system to Pulsar.||RabbitMQ source connector imports the messages of a RabbitMQ queue to a Pulsar topic.|
|Export data from Pulsar to another system.||Kinesis sink connector exports the messages of a Pulsar topic to a Kinesis stream.|
You can develop Pulsar source connectors and sink connectors.
/** * Open connector with configuration * * @param config initialization config * @param sourceContext * @throws Exception IO type exceptions when opening a connector */ void open(final Map<String, Object> config, SourceContext sourceContext) throws Exception;
This method is called when the source connector is initialized.
In this method, you can retrieve all connector specific settings through the passed-in
configparameter and initialize all necessary resources.
For example, a Kafka connector can create a Kafka client in this
Besides, Pulsar runtime also provides a
SourceContextfor the connector to access runtime resources for tasks like collecting metrics. The implementation can save the
SourceContextfor future use.
/** * Reads the next message from source. * If source does not have any new messages, this call should block. * @return next message from source. The return result should never be null * @throws Exception */ Record<T> read() throws Exception;
If nothing to return, the implementation should be blocking rather than returning
Recordshould encapsulate the following information, which is needed by Pulsar IO runtime.
Recordshould provide the following variables:
Variable Required Description
No Pulsar topic name from which the record is originated from.
No Messages can optionally be tagged with keys.
For more information, see Routing modes.
Yes Actual data of the record.
No Event time of the record from the source.
No If the record is originated from a partitioned source, it returns its
PartitionIdis used as a part of the unique identifier by Pulsar IO runtime to deduplicate messages and achieve exactly-once processing guarantee.
No If the record is originated from a sequential source, it returns its
RecordSequenceis used as a part of the unique identifier by Pulsar IO runtime to deduplicate messages and achieve exactly-once processing guarantee.
No If the record carries user-defined properties, it returns those properties.
No Topic to which message should be written.
No A class which carries data sent by users.
For more information, see Message.java.
Recordshould provide the following methods:
Acknowledge that the record is fully processed.
Indicate that the record fails to be processed.
For more information about how to create a source connector, see
/** * Open connector with configuration * * @param config initialization config * @param sinkContext * @throws Exception IO type exceptions when opening a connector */ void open(final Map<String, Object> config, SinkContext sinkContext) throws Exception;
/** * Write a message to Sink * @param inputRecordContext Context of input record from the source * @param record record to write to sink * @throws Exception */ void write(Record<T> record) throws Exception;
During the implementation, you can decide how to write the
Keyto the actual source, and leverage all the provided information such as
RecordSequenceto achieve different processing guarantees.
You also need to ack records (if messages are sent successfully) or fail records (if messages fail to send).
Testing connectors can be challenging because Pulsar IO connectors interact with two systems that may be difficult to mock—Pulsar and the system to which the connector is connecting.
It is recommended writing special tests to test the connector functionalities as below while mocking the external service.
You can create unit tests for your connector.
Once you have written sufficient unit tests, you can add separate integration tests to verify end-to-end functionality.
Pulsar uses testcontainers for all integration tests.
For more information about how to create integration tests for Pulsar connectors, see
Once you've developed and tested your connector, you need to package it so that it can be submitted to a Pulsar Functions cluster.
If you plan to package and distribute your connector for others to use, you are obligated to license and copyright your own code properly. Remember to add the license and copyright to all libraries your code uses and to your distribution.
If you use the NAR method, the NAR plugin automatically creates a
DEPENDENCIESfile in the generated NAR package, including the proper licensing and copyrights of all libraries of your connector.
NAR stands for NiFi Archive, which is a custom packaging mechanism used by Apache NiFi, to provide a bit of Java ClassLoader isolation.
For more information about how NAR works, see here.
Pulsar uses the same mechanism for packaging all built-in connectors.
The easiest approach to package a Pulsar connector is to create a NAR package using nifi-nar-maven-plugin.
All you need to do is to include this nifi-nar-maven-plugin in your maven project for your connector as below.
<plugins> <plugin> <groupId>org.apache.nifi</groupId> <artifactId>nifi-nar-maven-plugin</artifactId> <version>1.2.0</version> </plugin> </plugins>
For more information about an how to use NAR for Pulsar connectors, see
An alternative approach is to create an uber JAR that contains all of the connector's JAR files and other resource files. No directory internal structure is necessary.
You can use maven-shade-plugin to create a uber JAR as below:
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.1.1</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <filters> <filter> <artifact>*:*</artifact> </filter> </filters> </configuration> </execution> </executions> </plugin>