CSV Schemas

CSV is a simple, flat, text-based format. It does not define types, nesting, arrays, or structure.
MAPS supports CSV schemas by providing a minimal configuration that allows CSV rows to be mapped into Typed Events.

This page describes how CSV schemas are defined, how the server interprets them, and how they integrate into the MAPS processing pipeline.

1. Overview

CSV offers:

flat rows
string-based values
optional numeric parsing
no native schema language

MAPS wraps CSV with a lightweight schema definition so CSV data can be:

validated (header count)
converted to Typed Events
used in filtering, transformation, and statistics
converted to other formats (JSON, Avro, Protobuf, etc.)

2. Schema Format (SchemaConfig)

A CSV schema in MAPS is represented as:

{
  "format": "csv",
  "schema": {
    "headerValues": "col1, col2, col3",
    "interpretNumericStrings": true
  }
}

It maps directly to:

@Getter
@Setter
public static final class CsvConfig {
    private String headerValues;
    private boolean interpretNumericStrings;
}

2.1 `headerValues`

A comma‑separated list of field names.

Used to name each column in the CSV row.
Parsed using uniVocity CsvParser.
Whitespace around commas is ignored unless quoted.

Example:

"name, id, email"

Produces fields:

name
id
email

2.2 `interpretNumericStrings`

Controls whether MAPS attempts to convert CSV strings into numbers:

When `true`

"42" → int 42
"3.14" → double 3.14
"00123" → 123
"1e3" → 1000.0

When `false`

All fields remain strings.

This impacts:

typed filtering
schema-to-schema conversions
statistics accuracy

3. Typed Event Mapping

Each CSV row becomes a Typed Event.

Example CSV:

alice, 1001, [email protected]

With schema:

{
  "headerValues": "name, id, email",
  "interpretNumericStrings": true
}

Typed Event:

{
  "name": "alice",
  "id": 1001,
  "email": "[email protected]"
}

CSV supports no nested structures.
Every field is top-level.

4. Example CSV SchemaConfig

{
  "versionId": "1",
  "name": "Temperature CSV",
  "description": "BME688 sensor CSV data",
  "labels": {
    "uniqueId": "3a0d7bc0-9d6c-4c2d-a67f-e37d70f0cafe",
    "resource": "sensor",
    "interface": "sensor.bme688.csv"
  },
  "format": "csv",
  "schema": {
    "headerValues": "timestamp, temperature, humidity",
    "interpretNumericStrings": true
  }
}

5. Limitations of CSV Schemas

CSV is intentionally simple, but this means:

❌ No nested objects
❌ No arrays
❌ No enums
❌ No type constraints (min/max, regex, etc.)
❌ No binary fields
❌ No timestamps unless interpreted as strings

CSV schemas cannot express the richness of other formats.
They are best used for logging, integration with legacy systems, or simple time-series feeds.

6. Integration with MAPS

After parsing and typing, CSV-derived Typed Events work seamlessly with:

filtering
statistics
transformations
schema-to-schema conversions
multi-protocol publishing

7. Example Usage Flow

7.1 Ingest CSV via MQTT

Topic:

sensors/bme688/csv

Payload:

2025-01-01T10:00:00Z,20.1,45

7.2 MAPS applies schema and produces Typed Event

{
  "timestamp": "2025-01-01T10:00:00Z",
  "temperature": 20.1,
  "humidity": 45
}

7.3 Transform to JSON

MAPS can automatically re‑encode into JSON, Avro, Protobuf, etc.

8. Best Practices

Prefer interpretNumericStrings = true unless you require strict textual fields.
Avoid leading/trailing spaces in header values.
Ensure the number of CSV columns matches the header count.
Use CSV for simple or legacy data, not complex structures.

1. Overview

2. Schema Format (SchemaConfig)

2.1 headerValues​

2.2 interpretNumericStrings​

When true​

When false​