Skip to main content

Pulsar SQL Overview

Apache Pulsar is used to store streams of event data, and the event data is structured with predefined fields. With the implementation of the Schema Registry, you can store structured data in Pulsar and query the data by using Trino (formerly Presto SQL.md).

As the core of Pulsar SQL, Presto Pulsar connector enables Presto workers within a Presto cluster to query data from Pulsar.

The Pulsar consumer and reader interfaces

The query performance is efficient and highly scalable, because Pulsar adopts two level segment based architecture.

Topics in Pulsar are stored as segments in Apache BookKeeper. Each topic segment is replicated to some BookKeeper nodes, which enables concurrent reads and high read throughput. You can configure the number of BookKeeper nodes, and the default number is 3. In Presto Pulsar connector, data is read directly from BookKeeper, so Presto workers can read concurrently from horizontally scalable number BookKeeper nodes.

The Pulsar consumer and reader interfaces