Getting Started
Installation
pip install git+https://github.com/DISSC-yale/fairway.git
Initializing a Project
fairway init my_project --engine spark
cd my_project
Running the Pipeline
Local Development (DuckDB)
Run the pipeline locally on your laptop:
fairway run
# Or with explicit config:
fairway run --config config/fairway.yaml
HPC Execution (Slurm + Spark)
Submit the pipeline to a Slurm cluster with Spark:
# Submit with Spark cluster
fairway submit --with-spark
# Submit with custom resources
fairway submit --with-spark --mem 64G --cpus 8 --time 48:00:00
# Preview the job script first
fairway submit --with-spark --dry-run
Check Job Status
fairway status # Show your running jobs
fairway cancel <JOB_ID> # Cancel a specific job
fairway cancel --all # Cancel all your jobs
Directory Structure
config/: Configuration files.data/: Data storage.src/: Custom code.logs/: Execution logs.