Configure state storage
Pulsar Functions use Apache BookKeeper as a state storage interface. Pulsar integrates with BookKeeper table service to store state for functions. For example, a WordCount
function can store the state of its counters into BookKeeper table service via State APIs.
States are key-value pairs, where a key is a string and its value is arbitrary binary data - counters are stored as 64-bit big-endian binary values. Keys are scoped to an individual function and shared between instances of that function.
State storage is not available for Go functions.
Call state APIs
Pulsar Functions expose APIs for mutating and accessing state
. These APIs are available in the Context object when you use Java/Python SDK to develop functions.
The following table outlines the states that can be accessed within Java and Python functions.
State-related API | Java | Python |
---|---|---|
Increment counter | incrCounter incrCounterAsync | incr_counter |
Retrieve counter | getCounter getCounterAsync | get_counter |
Update state | putState putStateAsync | put_state |
Retrieve state | getState getStateAsync | get_state |
Delete state | deleteState | del_counter |
Increment counter
You can use incrCounter
to increment the counter of a given key
by the given amount
.
If the key
does not exist, a new key is created.
- Java
- Python
/**
* Increment the built-in distributed counter referred by key
* @param key The name of the key
* @param amount The amount to be incremented
*/
void incrCounter(String key, long amount);
To asynchronously increment the counter, you can use incrCounterAsync
.
/**
* Increment the built-in distributed counter referred by key
* but dont wait for the completion of the increment operation
*
* @param key The name of the key
* @param amount The amount to be incremented
*/
CompletableFuture<Void> incrCounterAsync(String key, long amount);
def incr_counter(self, key, amount):
"""incr the counter of a given key in the managed state"""
Retrieve counter
You can use getCounter
to retrieve the counter of a given key
mutated by incrCounter
.
- Java
- Python
/**
* Retrieve the counter value for the key.
*
* @param key name of the key
* @return the amount of the counter value for this key
*/
long getCounter(String key);
To asynchronously retrieve the counter mutated by incrCounterAsync
, you can use getCounterAsync
.
/**
* Retrieve the counter value for the key, but don't wait
* for the operation to be completed
*
* @param key name of the key
* @return the amount of the counter value for this key
*/
CompletableFuture<Long> getCounterAsync(String key);
def get_counter(self, key):
"""get the counter of a given key in the managed state"""
Update state
Besides the counter
API, Pulsar also exposes a general key/value API for functions to store and update the state of a given key
.
- Java
- Python
/**
* Update the state value for the key.
*
* @param key name of the key
* @param value state value of the key
*/
void putState(String key, ByteBuffer value);
To asynchronously update the state of a given key
, you can use putStateAsync
.
/**
* Update the state value for the key, but don't wait for the operation to be completed
*
* @param key name of the key
* @param value state value of the key
*/
CompletableFuture<Void> putStateAsync(String key, ByteBuffer value);
def put_state(self, key, value):
"""update the value of a given key in the managed state"""
Retrieve state
You can use getState
to retrieve the state of a given key
.
- Java
- Python
/**
* Retrieve the state value for the key.
*
* @param key name of the key
* @return the state value for the key.
*/
ByteBuffer getState(String key);
To asynchronously retrieve the state of a given key
, you can use getStateAsync
.
/**
* Retrieve the state value for the key, but don't wait for the operation to be completed
*
* @param key name of the key
* @return the state value for the key.
*/
CompletableFuture<ByteBuffer> getStateAsync(String key);
def get_state(self, key):
"""get the value of a given key in the managed state"""
Delete state
Both counters and binary values share the same keyspace, so this API deletes either type.
- Java
/**
* Delete the state value for the key.
*
* @param key name of the key
*/
void deleteState(String key);
Query state via CLI
Besides using the State APIs to store the state of functions in Pulsar's state storage and retrieve it back from the storage, you can use CLI commands to query the state of functions.
bin/pulsar-admin functions querystate \
--tenant <tenant> \
--namespace <namespace> \
--name <function-name> \
--state-storage-url <bookkeeper-service-url> \
--key <state-key> \
[---watch]
If --watch
is specified, the CLI tool keeps running to get the latest value of the provided state-key
.
Example
The example of WordCountFunction
demonstrates how state
is stored within Pulsar Functions.
- Java
- Python
The logic of WordCountFunction is simple and straightforward:
-
The function splits the received
String
into multiple words using regex\\.
. -
For each
word
, the function incrementscounter
by 1 viaincrCounter(key, amount)
.import org.apache.pulsar.functions.api.Context;
import org.apache.pulsar.functions.api.Function;
import java.util.Arrays;
public class WordCountFunction implements Function<String, Void> {
@Override
public Void process(String input, Context context) throws Exception {
Arrays.asList(input.split("\\.")).forEach(word -> context.incrCounter(word, 1));
return null;
}
}
The logic of this WordCount
function is simple and straightforward:
-
The function first splits the received string into multiple words.
-
For each
word
, the function incrementscounter
by 1 viaincr_counter(key, amount)
.from pulsar import Function
class WordCount(Function):
def process(self, item, context):
for word in item.split():
context.incr_counter(word, 1)