Deploying a Pulsar cluster on AWS using Terraform and Ansible
For instructions on deploying a single Pulsar cluster manually rather than using Terraform and Ansible, see Deploying a Pulsar cluster on bare metal. For instructions on manually deploying a multi-cluster Pulsar instance, see Deploying a Pulsar instance on bare metal.
One of the easiest ways to get a Pulsar cluster running on Amazon Web Services (AWS) is to use the the Terraform infrastructure provisioning tool and the Ansible server automation tool. Terraform can create the resources necessary to run the Pulsar cluster---EC2 instances, networking and security infrastructure, etc.---while Ansible can install and run Pulsar on the provisioned resources.
Requirements and setup
In order install a Pulsar cluster on AWS using Terraform and Ansible, you'll need:
- An AWS account and the
aws
command-line tool - Python and pip
- The
terraform-inventory
tool, which enables Ansible to use Terraform artifacts
You'll also need to make sure that you're currently logged into your AWS account via the aws
tool:
$ aws configure
Installation
You can install Ansible on Linux or macOS using pip.
$ pip install ansible
You can install Terraform using the instructions here.
You'll also need to have the Terraform and Ansible configurations for Pulsar locally on your machine. They're contained in Pulsar's GitHub repository, which you can fetch using Git:
$ git clone https://github.com/apache/pulsar
$ cd pulsar/deployment/terraform-ansible/aws
SSH setup
If you already have an SSH key and would like to use it, you skip generating the SSH keys and update
private_key_file
setting inansible.cfg
file andpublic_key_path
setting interraform.tfvars
file.For example, if you already had a private SSH key in
~/.ssh/pulsar_aws
and a public key in~/.ssh/pulsar_aws.pub
, you can do followings:
- update
ansible.cfg
with following values:
private_key_file=~/.ssh/pulsar_aws
- update
terraform.tfvars
with following values:
public_key_path=~/.ssh/pulsar_aws.pub
In order to create the necessary AWS resources using Terraform, you'll need to create an SSH key. To create a private SSH key in ~/.ssh/id_rsa
and a public key in ~/.ssh/id_rsa.pub
:
$ ssh-keygen -t rsa
Do not enter a passphrase (hit Enter when prompted instead). To verify that a key has been created:
$ ls ~/.ssh
id_rsa id_rsa.pub
Creating AWS resources using Terraform
To get started building AWS resources with Terraform, you'll need to install all Terraform dependencies:
$ terraform init
# This will create a .terraform folder
Once you've done that, you can apply the default Terraform configuration:
$ terraform apply
You should then see this prompt:
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value:
Type yes
and hit Enter. Applying the configuration could take several minutes. When it's finished, you should see Apply complete!
along with some other information, including the number of resources created.
Applying a non-default configuration
You can apply a non-default Terraform configuration by changing the values in the terraform.tfvars
file. The following variables are available:
Variable name | Description | Default |
---|---|---|
public_key_path | The path of the public key that you've generated. | ~/.ssh/id_rsa.pub |
region | The AWS region in which the Pulsar cluster will run | us-west-2 |
availability_zone | The AWS availability zone in which the Pulsar cluster will run | us-west-2a |
aws_ami | The Amazon Machine Image (AMI) that will be used by the cluster | ami-9fa343e7 |
num_zookeeper_nodes | The number of ZooKeeper nodes in the ZooKeeper cluster | 3 |
num_bookie_nodes | The number of bookies that will run in the cluster | 3 |
num_broker_nodes | The number of Pulsar brokers that will run in the cluster | 2 |
num_proxy_nodes | The number of Pulsar proxies that will run in the cluster | 1 |
base_cidr_block | The root CIDR that will be used by network assets for the cluster | 10.0.0.0/16 |
instance_types | The EC2 instance types to be used. This variable is a map with two keys: zookeeper for the ZooKeeper instances, bookie for the BookKeeper bookies and broker and proxy for Pulsar brokers and bookies | t2.small (ZooKeeper), i3.xlarge (BookKeeper) and c5.2xlarge (Brokers/Proxies) |
What is installed
When you run the Ansible playbook, the following AWS resources will be used:
- 9 total Elastic Compute Cloud (EC2) instances running the ami-9fa343e7 Amazon Machine Image (AMI), which runs Red Hat Enterprise Linux (RHEL) 7.4. By default, that includes:
- 3 small VMs for ZooKeeper (t2.small instances)
- 3 larger VMs for BookKeeper bookies (i3.xlarge instances)
- 2 larger VMs for Pulsar brokers (c5.2xlarge instances)
- 1 larger VMs for Pulsar proxy (c5.2xlarge instances)
- An EC2 security group
- A virtual private cloud (VPC) for security
- An API Gateway for connections from the outside world
- A route table for the Pulsar cluster's VPC
- A subnet for the VPC
All EC2 instances for the cluster will run in the us-west-2 region.
Fetching your Pulsar connection URL
When you apply the Terraform configuration by running terraform apply
, Terraform will output a value for the pulsar_service_url
. It should look something like this:
pulsar://pulsar-elb-1800761694.us-west-2.elb.amazonaws.com:6650
You can fetch that value at any time by running terraform output pulsar_service_url
or parsing the terraform.tstate
file (which is JSON, even though the filename doesn't reflect that):
$ cat terraform.tfstate | jq .modules[0].outputs.pulsar_service_url.value
Destroying your cluster
At any point, you can destroy all AWS resources associated with your cluster using Terraform's destroy
command:
$ terraform destroy
Setup Disks
Before you run the Pulsar playbook, you want to mount the disks to the correct directories on those bookie nodes.
Since different type of machines would have different disk layout, if you change the instance_types
in your terraform
config, you need to update the task defined in setup-disk.yaml
file.
To setup disks on bookie nodes, use this command:
$ ansible-playbook \
--user='ec2-user' \
--inventory=`which terraform-inventory` \
setup-disk.yaml
After running this command, the disks will be mounted under /mnt/journal
as journal disk, and /mnt/storage
as ledger disk.
It is important to run this command only once! If you attempt to run this command again after you have run Pulsar playbook,
it might be potentially erase your disks again and cause the bookies to fail to start up.
Running the Pulsar playbook
Once you've created the necessary AWS resources using Terraform, you can install and run Pulsar on the Terraform-created EC2 instances using Ansible. To do so, use this command:
$ ansible-playbook \
--user='ec2-user' \
--inventory=`which terraform-inventory` \
../deploy-pulsar.yaml
If you've created a private SSH key at a location different from ~/.ssh/id_rsa
, you can specify the different location using the --private-key
flag:
$ ansible-playbook \
--user='ec2-user' \
--inventory=`which terraform-inventory` \
--private-key="~/.ssh/some-non-default-key" \
../deploy-pulsar.yaml
Accessing the cluster
You can now access your running Pulsar using the unique Pulsar connection URL for your cluster, which you can obtain using the instructions above.
For a quick demonstration of accessing the cluster, we can use the Python client for Pulsar and the Python shell. First, install the Pulsar Python module using pip:
$ pip install pulsar-client
Now, open up the Python shell using the python
command:
$ python
Once in the shell, run the following:
>>> import pulsar
>>> client = pulsar.Client('pulsar://pulsar-elb-1800761694.us-west-2.elb.amazonaws.com:6650')
# Make sure to use your connection URL
>>> producer = client.create_producer('persistent://public/default/test-topic')
>>> producer.send('Hello world')
>>> client.close()
If all of these commands are successful, your cluster can now be used by Pulsar clients!