Overview

Schemas define the structure and types of event payloads. They are the foundation of the MAPS data pipeline.

1. Why Schemas Matter

Validation: Ensures every message matches expected structure.
Transformation: Enables field-level mapping and enrichment.
Filtering: Makes expressions type-aware and accurate.
Statistics: Allows numeric and string fields to be aggregated correctly.

Without a schema, payloads are opaque and only minimal inference is possible.

2. Supported Formats

Name	Description
AVRO	Apache Avro
CBC	Compact Binary Schema (fixed layout)
CBOR	Concise Binary Object Representation
CSV	Comma separated values
JSON	JavaScript Object Notation
MessagePack	Binary JSON-like representation
Native	Single Java scalar value (int, long, double, String)
ProtoBuf	Google Protocol Buffers
RAW	Opaque bytes, no structure
XML	Extensible Markup Language

3. Schema Registration

Schemas are stored in the Schema Repository and are bound to contexts, usually topic names. When an event arrives, the server determines which schema applies and uses that to parse, validate, and type the payload.

How Registration Works

Each schema has a unique ID
Identifies a concrete schema definition (JSON, Protobuf, Avro, CBC, etc.).
Schemas are bound to a context
A context may be:
- a topic
- a topic pattern
- a schema reference inside a processor or transformation
Multiple versions per context
MAPS stores all versions. The active one may be:
- latest
- pinned ID
- version chosen by a transformation pipeline
Fallback behaviour
If no schema is registered:
- payload becomes RAW
- no structural validation
- stats and transformations run best-effort only

What the Server Does During Registration

Parse & validate the schema
- JSON validated against internal rules
- Avro / Protobuf / IDL correctness checks
- CBC validated for layout integrity
Store the schema in the Schema Repository as an internal unified model.
Create a context binding
Mapping: topic/pattern → schema ID
Update dependent subsystems
- caches
- processing engines
- transform pipelines
- statistics processors
- format conversion logic

During Event Processing

Resolve schema based on topic
Load schema (cached)
Decode payload into a Typed Event
Run processing pipeline

Typed events allow MAPS to:

evaluate expressions with correct typing
compute numeric/string stats
run transformations
convert formats safely

Without a schema, payloads are raw bytes and limited processing is possible.

4. Schema in Processing Flow

Schemas influence every stage of the event lifecycle. Once applied, the payload becomes a Typed Event, enabling correct filtering, transformation, statistics, and format conversion.

High-Level Flow

Typed Event Definition

A Typed Event is the unified internal representation MAPS generates from any schema format:

typed value tree
per-field type metadata
fast indexed paths
normalised fields (timestamps → epoch millis)

The rest of MAPS does not care whether the original payload was JSON, Avro, Protobuf, CBC, etc.

Step-by-Step Processing

1. Schema Lookup

Find schema bound to the topic/context.
If none exists: event stays opaque.

2. Parsing into Unified Model

Payload decoded via schema rules:

JSON
Protobuf
Avro
CBC
MessagePack / CBOR
XML

Output: Typed Event

3. Filtering

Typed evaluation allows:

numeric comparisons
string logic
boolean checks
proper null handling

All fast, no string-guessing.

4. Transformations

Typed Events can be mapped into new schema types:

renaming
flattening
nested extraction
type conversion
enrichment
full schema-to-schema translation

5. Statistics

Typed Events enable correct:

min/max/avg/median
stddev (Welford)
string frequency
histograms
sliding-window summaries

6. Format Conversion

Typed Events can be re-encoded into:

JSON
Protobuf
Avro
CBC
MessagePack
CBOR
XML
CSV (best-effort conventions)

Without a Schema

payload is opaque
filtering becomes string-based heuristics
transformations are restricted
stats degrade to string mode
cross-format conversions are impossible

5. Schema Compatibility Matrix

Different serialization formats support different field types natively. The matrix below shows what MAPS can safely convert between formats, and where conventions or special handling are required.

Field	Avro	CBC	CBOR	CSV	Protobuf	JSON	MessagePack	XML
stringId (string)	✅	✅	✅	✅	✅	✅	✅	✅
intId (int32)	✅	✅	✅	✅	✅	✅	✅	✅
longId (int64)	✅	✅	✅	⚠️	✅	✅	✅	✅
floatId (float32)	✅	✅	✅	⚠️	✅	✅	✅	✅
doubleId (float64)	✅	✅	✅	⚠️	✅	✅	✅	✅
booleanId	✅	✅	✅	⚠️	✅	✅	✅	✅
bytesId (binary)	✅	✅	✅	⚠️	✅	⚠️	✅	⚠️
arrayId (array<int>)	✅	✅*	✅	⚠️	✅	✅	✅	⚠️
nestedObj	✅	✅	✅	⚠️	✅	✅	✅	✅
enumId	✅	✅	✅	⚠️	✅	✅	⚠️	✅
timestampId	✅†	✅‡	✅§	⚠️	✅¶	✅	✅	✅

This table outlines schema type compatibility across all serialization formats: Avro, CBC, CBOR, CSV, Protobuf, JSON, MessagePack, and XML.

Legend

✅ Native / straightforward support
⚠️ Supported with convention (e.g. stringify, JSON-in-cell, base64, etc.)
✅* CBC arrays require explicit element length/layout
✅† Avro: logicalType: timestamp-millis or timestamp-micros
✅‡ CBC: store integer epoch or ISO string via encalc/decalc
✅§ CBOR: use tag 0/1 (date/time) or plain ISO string
✅¶ Protobuf: use google.protobuf.Timestamp or epoch millis

Notes on Compatibility

CSV

CSV cannot encode structured or binary data natively. MAPS uses conventions such as:

JSON-in-cell for objects or arrays
base64 for binary data
string rendering for timestamps

Usable, but not elegant.

CBC

CBC always requires an explicit field layout. Arrays, enums, timestamps, and nested objects all rely on exact offsets and encoding rules.
Nothing is inferred.

JSON / MessagePack

Formats allow multiple valid encodings for:

timestamps
enums
binary data

MAPS applies internal conventions, but without schemas conversions can be lossy or ambiguous.

XML

XML supports all MAPS field types, but round-tripping requires consistent rules for:

ordering
attributes vs elements
namespaces

MAPS enforces deterministic mapping to keep cross-format conversions stable.

1. Why Schemas Matter

2. Supported Formats

3. Schema Registration

How Registration Works​

What the Server Does During Registration​

During Event Processing​

4. Schema in Processing Flow

High-Level Flow​

Typed Event Definition​

Step-by-Step Processing​

1. Schema Lookup​

2. Parsing into Unified Model​

3. Filtering​

4. Transformations​

5. Statistics​

6. Format Conversion​

Without a Schema​