Avro Schemas
Apache Avro is a compact, row-oriented serialization format designed for high-throughput data systems. MAPS treats Avro as a first-class schema type, with tight integration into the Typed Event pipeline.
1. Format Overview
Avro defines data using a JSON schema and encodes records in a compact binary format.
Key characteristics
- Schema stored as JSON, data encoded as binary
- Strong typing with support for:
- records, arrays, maps
- enums, unions, fixed, logical types
- Well-suited for:
- telemetry streams
- log/event pipelines
- long-lived topic-based data with evolution over time
Why use Avro in MAPS?
- Efficient binary encoding
- Built-in schema evolution features (defaults, aliases, unions)
- Good fit for high-volume IoT and analytics streams
- Plays well with downstream big-data / lake / warehouse tooling
2. SchemaConfig for Avro
All Avro schemas in MAPS are stored as a SchemaConfig:
formatmust be"avro".schemaholds the Avro JSON schema.schemaBase64is typically null for Avro.labelscarry routing and discovery metadata (including CoAP interface/resource when exposed over CoAP).
2.1 Required fields for Avro
At the SchemaConfig level:
format→"avro"name→ logical schema nameversionId→ logical schema versionschema→ valid Avro JSON schemalabels.matchExpression→ regex mapping topics to this schemalabels.uniqueId→ stable schema identifierlabels.interface→ optional: CoAPifvalue if exposed via CoAPlabels.resource→ optional: CoAPrtvalue if exposed via CoAP
3. Example Avro SchemaConfig (BME688)
Below is an example Avro-based SchemaConfig for the BME688 sensor payload.
{
"versionId": "1",
"name": "BME688-Avro",
"description": "BME688 VOC, pressure, temperature and humidity telemetry (Avro-encoded)",
"labels": {
"comments": "I2C device BME688 VOC, Pressure, Temperature and Humidity Sensor",
"uniqueId": "b1dc43de-4c9b-5d86-9425-cf958eeb598d",
"resource": "sensor",
"interface": "sensor.bme688"
},
"format": "avro",
"schema": {
"type": "record",
"name": "BME688Reading",
"namespace": "io.mapsmessaging.sensors",
"fields": [
{
"name": "temperature",
"type": "double",
"doc": "Unit: °C, range -40.0 to 85.0"
},
{
"name": "humidity",
"type": "double",
"doc": "Unit: %RH, range 10.0 to 90.0"
},
{
"name": "pressure",
"type": "double",
"doc": "Unit: hPa, range 300.0 to 1100.0"
},
{
"name": "gas",
"type": "double",
"doc": "Unit: Ω, range 0.0 to 65535.0"
},
{
"name": "heaterStatus",
"type": "string"
},
{
"name": "gasMode",
"type": "string"
},
{
"name": "dewPoint",
"type": "double",
"doc": "Unit: °C, range -50.0 to 100.0"
},
{
"name": "condensationRisk",
"type": "double",
"doc": "Risk score in [0.0, 1.0]"
},
{
"name": "timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
},
"doc": "Event time, epoch millis"
}
]
}
}
Notes:
- The Avro schema sits directly in
schemaas standard Avro JSON. timestampuses Avro'slogicalType: "timestamp-millis"to align with MAPS' normalised time handling.- Ranges and units are carried in the Avro
docfield.
4. How MAPS Uses Avro Schemas
At runtime, MAPS:
- Resolves the
SchemaConfigby topic viamatchExpression/ bindings. - Loads the Avro JSON schema from
schema. - Uses the Avro schema to decode binary Avro payloads into a Typed Event:
- field names and types come from the Avro schema
- logical types (like timestamps) are normalised internally
- The Typed Event flows through:
- filtering
- transformations
- statistics
- format conversion (e.g. Avro → JSON / Protobuf / CBC)
Schema evolution rules defined at the Avro level (e.g. added fields with defaults) are respected when decoding.
5. Warnings & Best Practices
- Keep
namespacestable; it forms part of the Avro type identity. - Prefer
doublefor sensor telemetry to avoid unnecessary rounding artefacts. - Use Avro logical types where appropriate:
timestamp-millis/timestamp-microsfor event timedatefor date-only values
- When changing schemas:
- add fields with sensible defaults
- avoid incompatible type changes
- use aliases when renaming fields
- Only use
schemaBase64for Avro if you truly need to store a compiled/binary representation; otherwise keep the canonical form as Avro JSON inschema.