Skip to main content

Avro Schemas

Apache Avro is a compact, row-oriented serialization format designed for high-throughput data systems. MAPS treats Avro as a first-class schema type, with tight integration into the Typed Event pipeline.


1. Format Overview

Avro defines data using a JSON schema and encodes records in a compact binary format.

Key characteristics

  • Schema stored as JSON, data encoded as binary
  • Strong typing with support for:
    • records, arrays, maps
    • enums, unions, fixed, logical types
  • Well-suited for:
    • telemetry streams
    • log/event pipelines
    • long-lived topic-based data with evolution over time

Why use Avro in MAPS?

  • Efficient binary encoding
  • Built-in schema evolution features (defaults, aliases, unions)
  • Good fit for high-volume IoT and analytics streams
  • Plays well with downstream big-data / lake / warehouse tooling

2. SchemaConfig for Avro

All Avro schemas in MAPS are stored as a SchemaConfig:

  • format must be "avro".
  • schema holds the Avro JSON schema.
  • schemaBase64 is typically null for Avro.
  • labels carry routing and discovery metadata (including CoAP interface/resource when exposed over CoAP).

2.1 Required fields for Avro

At the SchemaConfig level:

  • format"avro"
  • name → logical schema name
  • versionId → logical schema version
  • schema → valid Avro JSON schema
  • labels.matchExpression → regex mapping topics to this schema
  • labels.uniqueId → stable schema identifier
  • labels.interface → optional: CoAP if value if exposed via CoAP
  • labels.resource → optional: CoAP rt value if exposed via CoAP

3. Example Avro SchemaConfig (BME688)

Below is an example Avro-based SchemaConfig for the BME688 sensor payload.

{
"versionId": "1",
"name": "BME688-Avro",
"description": "BME688 VOC, pressure, temperature and humidity telemetry (Avro-encoded)",
"labels": {
"comments": "I2C device BME688 VOC, Pressure, Temperature and Humidity Sensor",
"uniqueId": "b1dc43de-4c9b-5d86-9425-cf958eeb598d",
"resource": "sensor",
"interface": "sensor.bme688"
},
"format": "avro",
"schema": {
"type": "record",
"name": "BME688Reading",
"namespace": "io.mapsmessaging.sensors",
"fields": [
{
"name": "temperature",
"type": "double",
"doc": "Unit: °C, range -40.0 to 85.0"
},
{
"name": "humidity",
"type": "double",
"doc": "Unit: %RH, range 10.0 to 90.0"
},
{
"name": "pressure",
"type": "double",
"doc": "Unit: hPa, range 300.0 to 1100.0"
},
{
"name": "gas",
"type": "double",
"doc": "Unit: Ω, range 0.0 to 65535.0"
},
{
"name": "heaterStatus",
"type": "string"
},
{
"name": "gasMode",
"type": "string"
},
{
"name": "dewPoint",
"type": "double",
"doc": "Unit: °C, range -50.0 to 100.0"
},
{
"name": "condensationRisk",
"type": "double",
"doc": "Risk score in [0.0, 1.0]"
},
{
"name": "timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
},
"doc": "Event time, epoch millis"
}
]
}
}

Notes:

  • The Avro schema sits directly in schema as standard Avro JSON.
  • timestamp uses Avro's logicalType: "timestamp-millis" to align with MAPS' normalised time handling.
  • Ranges and units are carried in the Avro doc field.

4. How MAPS Uses Avro Schemas

At runtime, MAPS:

  1. Resolves the SchemaConfig by topic via matchExpression / bindings.
  2. Loads the Avro JSON schema from schema.
  3. Uses the Avro schema to decode binary Avro payloads into a Typed Event:
    • field names and types come from the Avro schema
    • logical types (like timestamps) are normalised internally
  4. The Typed Event flows through:
    • filtering
    • transformations
    • statistics
    • format conversion (e.g. Avro → JSON / Protobuf / CBC)

Schema evolution rules defined at the Avro level (e.g. added fields with defaults) are respected when decoding.


5. Warnings & Best Practices

  • Keep namespace stable; it forms part of the Avro type identity.
  • Prefer double for sensor telemetry to avoid unnecessary rounding artefacts.
  • Use Avro logical types where appropriate:
    • timestamp-millis / timestamp-micros for event time
    • date for date-only values
  • When changing schemas:
    • add fields with sensible defaults
    • avoid incompatible type changes
    • use aliases when renaming fields
  • Only use schemaBase64 for Avro if you truly need to store a compiled/binary representation; otherwise keep the canonical form as Avro JSON in schema.