Pulsar SQL configuration and deployment
You can configure Presto Pulsar connector and deploy a cluster with the following instruction.
Configure Presto Pulsar Connector
You can configure Presto Pulsar Connector in the ${project.root}/conf/presto/catalog/pulsar.properties
properties file. The configuration for the connector and the default values are as follows.
# name of the connector to be displayed in the catalog
connector.name=pulsar
# the url of Pulsar broker service
pulsar.broker-service-url=http://localhost:8080
# URI of Zookeeper cluster
pulsar.zookeeper-uri=localhost:2181
# minimum number of entries to read at a single time
pulsar.entry-read-batch-size=100
# default number of splits to use per query
pulsar.target-num-splits=4
You can connect Presto to a Pulsar cluster with multiple hosts. To configure multiple hosts for brokers, add multiple URLs to pulsar.broker-service-url
. To configure multiple hosts for ZooKeeper, add multiple URIs to pulsar.zookeeper-uri
. The following is an example.
pulsar.broker-service-url=http://localhost:8080,localhost:8081,localhost:8082
pulsar.zookeeper-uri=localhost1,localhost2:2181
Query data from existing Presto clusters
If you already have a Presto cluster, you can copy the Presto Pulsar connector plugin to your existing cluster. Download the archived plugin package with the following command.
$ wget https://archive.apache.org/dist/pulsar/pulsar-2.5.0/apache-pulsar-2.5.0-bin.tar.gz
Deploy a new cluster
Since Pulsar SQL is powered by Presto, the configuration for deployment is the same for the Pulsar SQL worker.
For how to set up a standalone single node environment, refer to Query data.
You can use the same CLI args as the Presto launcher.
$ ./bin/pulsar sql-worker --help
Usage: launcher [options] command
Commands: run, start, stop, restart, kill, status
Options:
-h, --help show this help message and exit
-v, --verbose Run verbosely
--etc-dir=DIR Defaults to INSTALL_PATH/etc
--launcher-config=FILE
Defaults to INSTALL_PATH/bin/launcher.properties
--node-config=FILE Defaults to ETC_DIR/node.properties
--jvm-config=FILE Defaults to ETC_DIR/jvm.config
--config=FILE Defaults to ETC_DIR/config.properties
--log-levels-file=FILE
Defaults to ETC_DIR/log.properties
--data-dir=DIR Defaults to INSTALL_PATH
--pid-file=FILE Defaults to DATA_DIR/var/run/launcher.pid
--launcher-log-file=FILE
Defaults to DATA_DIR/var/log/launcher.log (only in
daemon mode)
--server-log-file=FILE
Defaults to DATA_DIR/var/log/server.log (only in
daemon mode)
-D NAME=VALUE Set a Java system property
The default configuration for the cluster is located in ${project.root}/conf/presto
. You can customize your deployment by modifying the default configuration.
You can set the worker to read from a different configuration directory, or set a different directory to write data.
$ ./bin/pulsar sql-worker run --etc-dir /tmp/incubator-pulsar/conf/presto --data-dir /tmp/presto-1
You can start the worker as daemon process.
$ ./bin/pulsar sql-worker start
Deploy a cluster on multiple nodes
You can deploy a Pulsar SQL cluster or Presto cluster on multiple nodes. The following example shows how to deploy a cluster on three-node cluster.
- Copy the Pulsar binary distribution to three nodes.
The first node runs as Presto coordinator. The minimal configuration requirement in the ${project.root}/conf/presto/config.properties
file is as follows.
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=<coordinator-url>
The other two nodes serve as worker nodes, you can use the following configuration for worker nodes.
coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=<coordinator-url>
-
Modify
pulsar.broker-service-url
andpulsar.zookeeper-uri
configuration in the${project.root}/conf/presto/catalog/pulsar.properties
file accordingly for the three nodes. -
Start the coordinator node.
$ ./bin/pulsar sql-worker run
- Start worker nodes.
$ ./bin/pulsar sql-worker run
- Start the SQL CLI and check the status of your cluster.
$ ./bin/pulsar sql --server <coordinate_url>
- Check the status of your nodes.
presto> SELECT * FROM system.runtime.nodes;
node_id | http_uri | node_version | coordinator | state
---------+-------------------------+--------------+-------------+--------
1 | http://192.168.2.1:8081 | testversion | true | active
3 | http://192.168.2.2:8081 | testversion | false | active
2 | http://192.168.2.3:8081 | testversion | false | active
For more information about deployment in Presto, refer to Presto deployment.