CLI Reference

Fairway provides a command-line interface for data ingestion and pipeline management.

Installation

pip install fairway
# Or with all optional dependencies:
pip install fairway[all]

Commands

`fairway init`

Initialize a new fairway project with configuration templates.

fairway init [PROJECT_NAME]

Creates: - fairway.yaml - Main configuration file - Makefile - Build and run shortcuts - scripts/ - HPC and driver scripts

`fairway run`

Run the ingestion pipeline (Worker Mode).

fairway run [OPTIONS]

Option	Default	Description
`--config TEXT`	Auto-discover	Path to config file
`--spark-master TEXT`	None	Spark master URL (e.g., `spark://host:port` or `local[*]`)
`--dry-run`	False	Show matched files without processing
`--log-file TEXT`	`logs/fairway.jsonl`	Path to JSONL log file (empty string to disable)
`--log-level`	`INFO`	Log level: DEBUG, INFO, WARNING, ERROR

Examples:

# Run with auto-discovered config
fairway run

# Run with specific config
fairway run --config config/production.yaml

# Dry run to see what would be processed
fairway run --dry-run

# Run with debug logging
fairway run --log-level DEBUG

# Run with custom log file
fairway run --log-file /path/to/pipeline.jsonl

`fairway generate-schema`

Generate schema from data files.

fairway generate-schema [OPTIONS]

Scans source files and infers column types using the two-phase approach: 1. Phase 1: Discover all columns from all files 2. Phase 2: Sample files for type inference

`fairway build`

Build the container image (Apptainer preferred, Docker fallback).

fairway build [OPTIONS]

Option	Description
`--apptainer`	Build Apptainer container (default)
`--docker`	Build Docker container
`--force`	Overwrite existing image

`fairway spark`

Manage Spark clusters for distributed processing.

fairway spark [SUBCOMMAND]

Subcommands: - start - Start a Spark cluster - stop - Stop the Spark cluster - status - Show cluster status

`fairway status`

Show status of submitted Slurm jobs.

fairway status

Wrapper around squeue with fairway-specific formatting.

`fairway submit`

Submit the pipeline as a Slurm job with optional Spark cluster provisioning.

fairway submit [OPTIONS]

Option	Default	Description
`--config TEXT`	Auto-discover	Path to config file
`--account TEXT`	From spark.yaml	Slurm account
`--partition TEXT`	`day`	Slurm partition
`--time TEXT`	`24:00:00`	Time limit (HH:MM:SS)
`--mem TEXT`	`16G`	Memory per node
`--cpus INTEGER`	`4`	CPUs per task
`--with-spark`	False	Start Spark cluster before running
`--dry-run`	False	Print job script without submitting

Examples:

# Submit with auto-discovered config
fairway submit

# Submit with Spark cluster
fairway submit --with-spark

# Submit with custom resources
fairway submit --with-spark --mem 64G --cpus 8 --time 48:00:00

# Preview the job script
fairway submit --with-spark --dry-run

`fairway summarize`

Generate summary stats and reports for already-ingested data. Use this after running fairway run --skip-summary to generate summaries in a separate step (useful on HPC where ingestion and summarization have different resource needs).

fairway summarize [OPTIONS]

Option	Default	Description
`--config TEXT`	Auto-discover	Path to config file
`--spark-master TEXT`	None	Spark master URL
`--slurm`	False	Submit as a Slurm job (loads Spark/Java modules)
`--account TEXT`	From spark.yaml	Slurm account
`--partition TEXT`	`day`	Slurm partition
`--time TEXT`	`04:00:00`	Slurm time limit
`--mem TEXT`	`32G`	Slurm memory
`--cpus INTEGER`	`4`	Slurm CPUs per task
`--log-file TEXT`	`logs/fairway.jsonl`	Path to JSONL log file
`--log-level`	`INFO`	Log level

Examples:

# Run summarization locally
fairway summarize

# Submit as Slurm job
fairway summarize --slurm

# Submit with custom resources
fairway summarize --slurm --mem 64G --time 08:00:00

`fairway cancel`

Cancel Slurm jobs (wrapper around scancel).

fairway cancel [JOB_ID]
fairway cancel --all

Option	Description
`JOB_ID`	Specific job ID to cancel
`--all`	Cancel all your running jobs (requires confirmation)

`fairway cache`

Manage fairway cache (extracted archives, manifests).

fairway cache [SUBCOMMAND]

Subcommands: - clear - Clear cached data - status - Show cache usage

`fairway eject`

Eject bundled scripts and container definitions for customization.

fairway eject [OPTIONS]

Option	Default	Description
`--scripts`	False	Eject only Slurm/HPC scripts
`--container`	False	Eject only container files (Apptainer.def, Dockerfile)
`-o, --output TEXT`	`.`	Output directory
`--force`	False	Overwrite existing files without prompting

Examples:

# Eject everything (container files + scripts)
fairway eject

# Eject only scripts to customize Slurm workflows
fairway eject --scripts

# Eject only container definitions
fairway eject --container

# Eject to a custom directory
fairway eject --output custom/

# Force overwrite existing files
fairway eject --force

Ejected Files:

Container files: - Apptainer.def - Apptainer container definition - Dockerfile - Docker container definition - .dockerignore - Docker ignore patterns - Makefile - Build and run commands

Scripts: - scripts/driver.sh - Slurm driver job script - scripts/driver-schema.sh - Schema generation driver - scripts/fairway-spark-start.sh - Spark cluster startup - scripts/fairway-hpc.sh - HPC utilities

`fairway shell`

Enter an interactive shell inside the fairway container.

fairway shell

`fairway pull`

Pull (mirror) the Apptainer container from the registry.

fairway pull

`fairway logs`

View and filter structured pipeline logs (JSONL format).

fairway logs [OPTIONS]

Option	Default	Description
`-f, --file TEXT`	`logs/fairway.jsonl`	Path to JSONL log file
`-l, --level`	None	Filter by log level: DEBUG, INFO, WARNING, ERROR
`-b, --batch TEXT`	None	Filter by batch ID (supports partial match)
`-n, --last INTEGER`	0	Show only last N entries
`--json`	False	Output raw JSON instead of formatted text
`--errors`	False	Shortcut for `--level ERROR`

Examples:

# Show all logs
fairway logs

# Show last 20 entries
fairway logs --last 20

# Show only errors
fairway logs --errors
fairway logs --level ERROR

# Filter by batch ID (partial match)
fairway logs --batch claims_CT_2023

# Raw JSON output (pipe to jq for advanced queries)
fairway logs --json | jq 'select(.level == "ERROR")'

# Custom log file
fairway logs --file /path/to/other.jsonl

Output Format:

2026-02-06T10:00:00 [INFO] Starting ingestion for dataset: sales
2026-02-06T10:00:01 [INFO] [sales_2023_01_abc123] Processing batch 1/3
2026-02-06T10:00:15 [ERROR] [sales_2023_01_abc123] Batch failed: OOM

`fairway manifest`

Inspect and query the file manifest (tracks processed files).

fairway manifest [SUBCOMMAND] [OPTIONS]

Subcommands: - show - Display manifest entries - query - Query files by status or batch - reset - Reset file status (for reprocessing)

Environment Variables

Variable	Description	Default
`FAIRWAY_BINDS`	Additional Apptainer bind paths (comma-separated)	Auto-detected from config
`FAIRWAY_TEMP`	Temporary directory for large operations (archive extraction, scratch)	System temp
`REDIVIS_API_TOKEN`	API token for Redivis data export	None (required for export)
`SPARK_LOCAL_IP`	Spark driver bind address	Auto-detect
`PYSPARK_SUBMIT_ARGS`	Additional Spark submit arguments	Auto-configured

Note: FAIRWAY_BINDS is useful when running on HPC clusters with different filesystem paths (e.g., /scratch, /gpfs, /project). Set this to your cluster's shared storage path if auto-detection doesn't find it.

Exit Codes

Code	Meaning
0	Success
1	General error
2	Configuration error
115	Data integrity error (RULE-115)