Schema evolution and compatibility
Normally, schemas do not stay the same over a long period of time. Instead, they undergo evolutions to satisfy new needs.
This chapter examines how Pulsar schema evolves and what Pulsar schema compatibility check strategies are.
Schema evolution​
Pulsar schema is defined in a data structure called SchemaInfo
.
Each SchemaInfo
stored with a topic has a version. The version is used to manage the schema changes happening within a topic.
The message produced with SchemaInfo
is tagged with a schema version. When a message is consumed by a Pulsar client, the Pulsar client can use the schema version to retrieve the corresponding SchemaInfo
and use the correct schema information to deserialize data.
What is schema evolution?​
Schemas store the details of attributes and types. To satisfy new business requirements, you need to update schemas inevitably over time, which is called schema evolution.
Any schema changes affect downstream consumers. Schema evolution ensures that the downstream consumers can seamlessly handle data encoded with both old schemas and new schemas.
How Pulsar schema should evolve?​
The answer is Pulsar schema compatibility check strategy. It determines how schema compares old schemas with new schemas in topics.
For more information, see Schema compatibility check strategy.
How does Pulsar support schema evolution?​
-
When a producer/consumer/reader connects to a broker, the broker deploys the schema compatibility checker configured by
schemaRegistryCompatibilityCheckers
to enforce schema compatibility check.The schema compatibility checker is one instance per schema type.
Currently, Avro and JSON have their own compatibility checkers, while all the other schema types share the default compatibility checker which disables schema evolution.
-
The producer/consumer/reader sends its client
SchemaInfo
to the broker. -
The broker knows the schema type and locates the schema compatibility checker for that type.
-
The broker uses the checker to check if the
SchemaInfo
is compatible with the latest schema of the topic by applying its compatibility check strategy.Currently, the compatibility check strategy is configured at the namespace level and applied to all the topics within that namespace.