Skip to main content

CSV Schemas

CSV is a simple, flat, text-based format. It does not define types, nesting, arrays, or structure.
MAPS supports CSV schemas by providing a minimal configuration that allows CSV rows to be mapped into Typed Events.

This page describes how CSV schemas are defined, how the server interprets them, and how they integrate into the MAPS processing pipeline.


1. Overview

CSV offers:

  • flat rows
  • string-based values
  • optional numeric parsing
  • no native schema language

MAPS wraps CSV with a lightweight schema definition so CSV data can be:

  • validated (header count)
  • converted to Typed Events
  • used in filtering, transformation, and statistics
  • converted to other formats (JSON, Avro, Protobuf, etc.)

2. Schema Format (SchemaConfig)

A CSV schema in MAPS is represented as:

{
"format": "csv",
"schema": {
"headerValues": "col1, col2, col3",
"interpretNumericStrings": true
}
}

It maps directly to:

@Getter
@Setter
public static final class CsvConfig {
private String headerValues;
private boolean interpretNumericStrings;
}

2.1 headerValues

A comma‑separated list of field names.

  • Used to name each column in the CSV row.
  • Parsed using uniVocity CsvParser.
  • Whitespace around commas is ignored unless quoted.

Example:

"name, id, email"

Produces fields:

  • name
  • id
  • email

2.2 interpretNumericStrings

Controls whether MAPS attempts to convert CSV strings into numbers:

When true

  • "42"int 42
  • "3.14"double 3.14
  • "00123"123
  • "1e3"1000.0

When false

  • All fields remain strings.

This impacts:

  • typed filtering
  • schema-to-schema conversions
  • statistics accuracy

3. Typed Event Mapping

Each CSV row becomes a Typed Event.

Example CSV:

alice, 1001, [email protected]

With schema:

{
"headerValues": "name, id, email",
"interpretNumericStrings": true
}

Typed Event:

{
"name": "alice",
"id": 1001,
"email": "[email protected]"
}

CSV supports no nested structures.
Every field is top-level.


4. Example CSV SchemaConfig

{
"versionId": "1",
"name": "Temperature CSV",
"description": "BME688 sensor CSV data",
"labels": {
"uniqueId": "3a0d7bc0-9d6c-4c2d-a67f-e37d70f0cafe",
"resource": "sensor",
"interface": "sensor.bme688.csv"
},
"format": "csv",
"schema": {
"headerValues": "timestamp, temperature, humidity",
"interpretNumericStrings": true
}
}

5. Limitations of CSV Schemas

CSV is intentionally simple, but this means:

  • ❌ No nested objects
  • ❌ No arrays
  • ❌ No enums
  • ❌ No type constraints (min/max, regex, etc.)
  • ❌ No binary fields
  • ❌ No timestamps unless interpreted as strings

CSV schemas cannot express the richness of other formats.
They are best used for logging, integration with legacy systems, or simple time-series feeds.


6. Integration with MAPS

After parsing and typing, CSV-derived Typed Events work seamlessly with:

  • filtering
  • statistics
  • transformations
  • schema-to-schema conversions
  • multi-protocol publishing

7. Example Usage Flow

7.1 Ingest CSV via MQTT

Topic:

sensors/bme688/csv

Payload:

2025-01-01T10:00:00Z,20.1,45

7.2 MAPS applies schema and produces Typed Event

{
"timestamp": "2025-01-01T10:00:00Z",
"temperature": 20.1,
"humidity": 45
}

7.3 Transform to JSON

MAPS can automatically re‑encode into JSON, Avro, Protobuf, etc.


8. Best Practices

  • Prefer interpretNumericStrings = true unless you require strict textual fields.
  • Avoid leading/trailing spaces in header values.
  • Ensure the number of CSV columns matches the header count.
  • Use CSV for simple or legacy data, not complex structures.