Supported Formats
Fairway supports ingestion of the following file formats:
| Format | Extension | Engines | Config Key |
|---|---|---|---|
| CSV | .csv |
DuckDB, PySpark | csv |
| TSV | .tsv, .tab |
DuckDB, PySpark | tsv, tab |
| JSON | .json, .jsonl |
DuckDB, PySpark | json |
| Parquet | .parquet |
DuckDB, PySpark | parquet |
| Fixed-Width | .txt, .dat |
DuckDB, PySpark | fixed_width |
CSV
Default format when not specified.
tables:
- name: "sales_data"
path: "data/raw/sales.csv"
format: "csv"
Features:
- Automatic type inference
- Header detection
- Configurable delimiter via read_options
TSV / Tab-Separated
Tab-delimited files, common in bioinformatics and legacy systems.
tables:
- name: "gene_data"
path: "data/raw/*.tsv"
format: "tsv"
JSON / JSONL
Supports standard JSON arrays or newline-delimited JSON (JSONL).
tables:
- name: "clickstream"
path: "data/raw/clicks.jsonl"
format: "json"
Parquet
Efficient pass-through ingestion for pre-processed data.
tables:
- name: "preprocessed"
path: "data/staged/*.parquet"
format: "parquet"
Fixed-Width
Text files where columns are defined by character positions (no delimiters). Common in mainframe/legacy data exports.
tables:
- name: "legacy_records"
path: "data/raw/*.txt"
format: "fixed_width"
fixed_width_spec: "specs/legacy_spec.yaml"
Requires a spec file defining column positions. See Fixed-Width Format for details.
Spec file format:
columns:
- name: id
start: 0 # 0-indexed position
length: 5
type: INTEGER
trim: true # Strip whitespace (optional)
- name: description
start: 5
length: 30
type: VARCHAR
Read Options
Pass engine-specific options via read_options:
tables:
- name: "pipe_delimited"
path: "data/*.csv"
format: "csv"
read_options:
delim: "|"
header: false
skip: 1
Adding New Formats
New formats require (per RULE-116):
- Test fixtures in
tests/fixtures/formats/<format>/ - Engine implementation in
engines/duckdb_engine.pyandengines/pyspark_engine.py - Tests in
tests/test_fixed_width.py(or similar) - Documentation update