Skip to content

CLI Reference

Fairway provides a command-line interface for data ingestion and pipeline management.

Installation

pip install fairway
# Or with all optional dependencies:
pip install fairway[all]

Commands

fairway init

Initialize a new fairway project with configuration templates.

fairway init [PROJECT_NAME]

Creates: - fairway.yaml - Main configuration file - Makefile - Build and run shortcuts - scripts/ - HPC and driver scripts

fairway run

Run the ingestion pipeline (Worker Mode).

fairway run [OPTIONS]
Option Default Description
--config TEXT Auto-discover Path to config file
--spark-master TEXT None Spark master URL (e.g., spark://host:port or local[*])
--dry-run False Show matched files without processing
--log-file TEXT logs/fairway.jsonl Path to JSONL log file (empty string to disable)
--log-level INFO Log level: DEBUG, INFO, WARNING, ERROR

Examples:

# Run with auto-discovered config
fairway run

# Run with specific config
fairway run --config config/production.yaml

# Dry run to see what would be processed
fairway run --dry-run

# Run with debug logging
fairway run --log-level DEBUG

# Run with custom log file
fairway run --log-file /path/to/pipeline.jsonl

fairway generate-schema

Generate schema from data files.

fairway generate-schema [OPTIONS]

Scans source files and infers column types using the two-phase approach: 1. Phase 1: Discover all columns from all files 2. Phase 2: Sample files for type inference

fairway build

Build the container image (Apptainer preferred, Docker fallback).

fairway build [OPTIONS]
Option Description
--apptainer Build Apptainer container (default)
--docker Build Docker container
--force Overwrite existing image

fairway spark

Manage Spark clusters for distributed processing.

fairway spark [SUBCOMMAND]

Subcommands: - start - Start a Spark cluster - stop - Stop the Spark cluster - status - Show cluster status

fairway status

Show status of submitted Slurm jobs.

fairway status

Wrapper around squeue with fairway-specific formatting.

fairway submit

Submit the pipeline as a Slurm job with optional Spark cluster provisioning.

fairway submit [OPTIONS]
Option Default Description
--config TEXT Auto-discover Path to config file
--account TEXT From spark.yaml Slurm account
--partition TEXT day Slurm partition
--time TEXT 24:00:00 Time limit (HH:MM:SS)
--mem TEXT 16G Memory per node
--cpus INTEGER 4 CPUs per task
--with-spark False Start Spark cluster before running
--dry-run False Print job script without submitting

Examples:

# Submit with auto-discovered config
fairway submit

# Submit with Spark cluster
fairway submit --with-spark

# Submit with custom resources
fairway submit --with-spark --mem 64G --cpus 8 --time 48:00:00

# Preview the job script
fairway submit --with-spark --dry-run

fairway summarize

Generate summary stats and reports for already-ingested data. Use this after running fairway run --skip-summary to generate summaries in a separate step (useful on HPC where ingestion and summarization have different resource needs).

fairway summarize [OPTIONS]
Option Default Description
--config TEXT Auto-discover Path to config file
--spark-master TEXT None Spark master URL
--slurm False Submit as a Slurm job (loads Spark/Java modules)
--account TEXT From spark.yaml Slurm account
--partition TEXT day Slurm partition
--time TEXT 04:00:00 Slurm time limit
--mem TEXT 32G Slurm memory
--cpus INTEGER 4 Slurm CPUs per task
--log-file TEXT logs/fairway.jsonl Path to JSONL log file
--log-level INFO Log level

Examples:

# Run summarization locally
fairway summarize

# Submit as Slurm job
fairway summarize --slurm

# Submit with custom resources
fairway summarize --slurm --mem 64G --time 08:00:00

fairway cancel

Cancel Slurm jobs (wrapper around scancel).

fairway cancel [JOB_ID]
fairway cancel --all
Option Description
JOB_ID Specific job ID to cancel
--all Cancel all your running jobs (requires confirmation)

fairway cache

Manage fairway cache (extracted archives, manifests).

fairway cache [SUBCOMMAND]

Subcommands: - clear - Clear cached data - status - Show cache usage

fairway eject

Eject bundled scripts and container definitions for customization.

fairway eject [OPTIONS]
Option Default Description
--scripts False Eject only Slurm/HPC scripts
--container False Eject only container files (Apptainer.def, Dockerfile)
-o, --output TEXT . Output directory
--force False Overwrite existing files without prompting

Examples:

# Eject everything (container files + scripts)
fairway eject

# Eject only scripts to customize Slurm workflows
fairway eject --scripts

# Eject only container definitions
fairway eject --container

# Eject to a custom directory
fairway eject --output custom/

# Force overwrite existing files
fairway eject --force

Ejected Files:

Container files: - Apptainer.def - Apptainer container definition - Dockerfile - Docker container definition - .dockerignore - Docker ignore patterns - Makefile - Build and run commands

Scripts: - scripts/driver.sh - Slurm driver job script - scripts/driver-schema.sh - Schema generation driver - scripts/fairway-spark-start.sh - Spark cluster startup - scripts/fairway-hpc.sh - HPC utilities

fairway shell

Enter an interactive shell inside the fairway container.

fairway shell

fairway pull

Pull (mirror) the Apptainer container from the registry.

fairway pull

fairway logs

View and filter structured pipeline logs (JSONL format).

fairway logs [OPTIONS]
Option Default Description
-f, --file TEXT logs/fairway.jsonl Path to JSONL log file
-l, --level None Filter by log level: DEBUG, INFO, WARNING, ERROR
-b, --batch TEXT None Filter by batch ID (supports partial match)
-n, --last INTEGER 0 Show only last N entries
--json False Output raw JSON instead of formatted text
--errors False Shortcut for --level ERROR

Examples:

# Show all logs
fairway logs

# Show last 20 entries
fairway logs --last 20

# Show only errors
fairway logs --errors
fairway logs --level ERROR

# Filter by batch ID (partial match)
fairway logs --batch claims_CT_2023

# Raw JSON output (pipe to jq for advanced queries)
fairway logs --json | jq 'select(.level == "ERROR")'

# Custom log file
fairway logs --file /path/to/other.jsonl

Output Format:

2026-02-06T10:00:00 [INFO] Starting ingestion for dataset: sales
2026-02-06T10:00:01 [INFO] [sales_2023_01_abc123] Processing batch 1/3
2026-02-06T10:00:15 [ERROR] [sales_2023_01_abc123] Batch failed: OOM

fairway manifest

Inspect and query the file manifest (tracks processed files).

fairway manifest [SUBCOMMAND] [OPTIONS]

Subcommands: - show - Display manifest entries - query - Query files by status or batch - reset - Reset file status (for reprocessing)

Environment Variables

Variable Description Default
FAIRWAY_BINDS Additional Apptainer bind paths (comma-separated) Auto-detected from config
FAIRWAY_TEMP Temporary directory for large operations (archive extraction, scratch) System temp
REDIVIS_API_TOKEN API token for Redivis data export None (required for export)
SPARK_LOCAL_IP Spark driver bind address Auto-detect
PYSPARK_SUBMIT_ARGS Additional Spark submit arguments Auto-configured

Note: FAIRWAY_BINDS is useful when running on HPC clusters with different filesystem paths (e.g., /scratch, /gpfs, /project). Set this to your cluster's shared storage path if auto-detection doesn't find it.

Exit Codes

Code Meaning
0 Success
1 General error
2 Configuration error
115 Data integrity error (RULE-115)