Version: Next

Deploy a Pulsar cluster on AWS using Terraform and Ansible

For instructions on deploying a single Pulsar cluster manually rather than using Terraform and Ansible, see Deploying a Pulsar cluster on bare metal. For instructions on manually deploying a multi-cluster Pulsar instance, see Deploying a Pulsar instance on bare metal.

One of the easiest ways to get a Pulsar cluster running on Amazon Web Services (AWS) is to use the Terraform infrastructure provisioning tool and the Ansible server automation tool. Terraform can create the resources necessary for running the Pulsar cluster---EC2 instances, networking and security infrastructure, etc.---While Ansible can install and run Pulsar on the provisioned resources.

To deploy a Pulsar cluster on AWS, complete the following steps.

Requirements and setup

To install a Pulsar cluster on AWS using Terraform and Ansible, you need to prepare the following things:

An AWS account and the aws command-line tool
Python and pip
The terraform-inventory tool, which enables Ansible to use Terraform artifacts

You also need to make sure that you are currently logged into your AWS account via the aws tool:

aws configure

Step 1: Installation

You can install Ansible on Linux or macOS using pip.

pip install ansible

You can install Terraform using the instructions here.

You also need to have the Terraform and Ansible configuration for Pulsar locally on your machine. You can find them in the GitHub repository of Pulsar, which you can fetch using Git commands:

git clone https://github.com/apache/pulsar
cd pulsar/deployment/terraform-ansible/aws

Step 2: SSH setup

If you already have an SSH key and want to use it, you can skip the step of generating an SSH key and update private_key_file setting in ansible.cfg file and public_key_path setting in terraform.tfvars file.

For example, if you already have a private SSH key in ~/.ssh/pulsar_aws and a public key in ~/.ssh/pulsar_aws.pub, follow the steps below:

update ansible.cfg with following values:

private_key_file=~/.ssh/pulsar_aws

update terraform.tfvars with following values:

public_key_path=~/.ssh/pulsar_aws.pub

To create the necessary AWS resources using Terraform, you need to create an SSH key. Enter the following commands to create a private SSH key in ~/.ssh/id_rsa and a public key in ~/.ssh/id_rsa.pub:

ssh-keygen -t rsa

Do not enter a passphrase (hit Enter instead when the prompt comes out). Enter the following command to verify that a key has been created:

ls ~/.ssh
id_rsa               id_rsa.pub

Step 3: Create AWS resources using Terraform

To start building AWS resources with Terraform, you need to install all Terraform dependencies. Enter the following command:

terraform init
# This will create a .terraform folder

After that, you can apply the default Terraform configuration by entering this command:

terraform apply

Then you see this prompt below:

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value:

Type yes and hit Enter. Applying the configuration could take several minutes. When the configuration applying finishes, you can see Apply complete! along with some other information, including the number of resources created.

Apply a non-default configuration

You can apply a non-default Terraform configuration by changing the values in the terraform.tfvars file. The following variables are available:

Variable name	Description	Default
`public_key_path`	The path of the public key that you have generated.	`~/.ssh/id_rsa.pub`
`region`	The AWS region in which the Pulsar cluster runs	`us-west-2`
`availability_zone`	The AWS availability zone in which the Pulsar cluster runs	`us-west-2a`
`aws_ami`	The Amazon Machine Image (AMI) that the cluster uses	`ami-9fa343e7`
`num_zookeeper_nodes`	The number of ZooKeeper nodes in the ZooKeeper cluster	3
`num_bookie_nodes`	The number of bookies that runs in the cluster	3
`num_broker_nodes`	The number of Pulsar brokers that runs in the cluster	2
`num_proxy_nodes`	The number of Pulsar proxies that runs in the cluster	1
`base_cidr_block`	The root CIDR that network assets uses for the cluster	`10.0.0.0/16`
`instance_types`	The EC2 instance types to be used. This variable is a map with two keys: `zookeeper` for the ZooKeeper instances, `bookie` for the BookKeeper bookies and `broker` and `proxy` for Pulsar brokers and bookies	`t2.small` (ZooKeeper), `i3.xlarge` (BookKeeper) and `c5.2xlarge` (Brokers/Proxies)

note

This Terraform/Ansible recipe provisions ZooKeeper as the metadata store. For new clusters, Oxia is the recommended metadata store; deploy it separately following the Oxia documentation and point the brokers and bookies at it as described in Configure metadata store.

What is installed

When you run the Ansible playbook, the following AWS resources are used:

9 total Elastic Compute Cloud (EC2) instances running the ami-9fa343e7 Amazon Machine Image (AMI), which runs Red Hat Enterprise Linux (RHEL) 7.4. By default, that includes:
- 3 small VMs for ZooKeeper (t3.small instances)
- 3 larger VMs for BookKeeper bookies (i3.xlarge instances)
- 2 larger VMs for Pulsar brokers (c5.2xlarge instances)
- 1 larger VMs for Pulsar proxy (c5.2xlarge instances)
An EC2 security group
A virtual private cloud (VPC) for security
An API Gateway for connections from the outside world
A route table for the Pulsar cluster's VPC
A subnet for the VPC

All EC2 instances for the cluster run in the us-west-2 region.

Fetch your Pulsar connection URL

When you apply the Terraform configuration by entering the command terraform apply, Terraform outputs a value for the pulsar_service_url. The value should look something like this:

pulsar://pulsar-elb-1800761694.us-west-2.elb.amazonaws.com:6650

You can fetch that value at any time by entering the command terraform output pulsar_service_url or parsing the terraform.tstate file (which is JSON, even though the filename does not reflect that):

cat terraform.tfstate | jq .modules[0].outputs.pulsar_service_url.value

Destroy your cluster

At any point, you can destroy all AWS resources associated with your cluster using Terraform's destroy command:

terraform destroy

Step 4: Setup Disks

Before you run the Pulsar playbook, you need to mount the disks to the correct directories on those bookie nodes. Since different types of machines have different disk layouts, you need to update the task defined in the setup-disk.yaml file after changing the instance_types in your terraform config,

To setup disks on bookie nodes, enter this command:

ansible-playbook \
--user='ec2-user' \
--inventory=`which terraform-inventory` \
setup-disk.yaml

When using Terraform version >= 0.12, and terraform-inventory throws an error: "Error reading tfstate file", add TF_STATE=./ before the ansible-playbook command.

TF_STATE=./ \
ansible-playbook \
--user='ec2-user' \
--inventory=`which terraform-inventory` \
setup-disk.yaml

After that, the disks are mounted under /mnt/journal as journal disk, and /mnt/storage as ledger disk. Remember to enter this command just only once. If you attempt to enter this command again after you have run the Pulsar playbook, your disks might potentially be erased again, causing the bookies to fail to start up.

Step 5: Run the Pulsar playbook

Once you have created the necessary AWS resources using Terraform, you can install and run Pulsar on the Terraform-created EC2 instances using Ansible.

(Optional) If you want to use any built-in IO connectors, edit the Download Pulsar IO packages task in the deploy-pulsar.yaml file and uncomment the connectors you want to use.

To run the playbook, enter this command:

ansible-playbook \
--user='ec2-user' \
--inventory=`which terraform-inventory` \
../deploy-pulsar.yaml

If you have created a private SSH key at a location different from ~/.ssh/id_rsa, you can specify the different location using the --private-key flag in the following command:

ansible-playbook \
--user='ec2-user' \
--inventory=`which terraform-inventory` \
--private-key="~/.ssh/some-non-default-key" \
../deploy-pulsar.yaml

Step 6: Access the cluster

You can now access your running Pulsar using the unique Pulsar connection URL for your cluster, which you can obtain following the instructions above.

For a quick demonstration of accessing the cluster, we can use the Python client for Pulsar and the Python shell. First, install the Pulsar Python module using pip:

pip install pulsar-client

Now, open up the Python shell using the python command:

python

Once you are in the shell, enter the following command:

>>> import pulsar
>>> client = pulsar.Client('pulsar://pulsar-elb-1800761694.us-west-2.elb.amazonaws.com:6650')
# Make sure to use your connection URL
>>> producer = client.create_producer('persistent://public/default/test-topic')
>>> producer.send('Hello world')
>>> client.close()

If all of these commands are successful, Pulsar clients can now use your cluster!

Requirements and setup​

Step 1: Installation​

Step 2: SSH setup​

Step 3: Create AWS resources using Terraform​

Apply a non-default configuration​

What is installed​

Fetch your Pulsar connection URL​

Destroy your cluster​

Step 4: Setup Disks​

Step 5: Run the Pulsar playbook​

Step 6: Access the cluster​