Skip to content

Fairway Data Ingestion

fairway is a portable, scalable data ingestion framework designed for sustainable management of centralized research data.

Core Philosophy

Traditional data ingestion often suffers from undocumented transformations, rigid pipelines, and difficult-to-scale infrastructure. fairway addresses these pain points by being:

  • Config-Driven: Define your pipeline in YAML, not just code.
  • Engine-Agnostic: Shift from local DuckDB processing to distributed PySpark on Slurm with a single config change.
  • HPC-Ready: Native Slurm integration with automatic Spark cluster provisioning.
  • Validation-First: Multi-level sanity and distribution checks are baked into the pipeline.

Where to Start?