v0.2.1 — latest releaseWhat's new →

Catch data quality issues
before they reach production

Point it at a CSV, Parquet, or JSON file. Pipedog learns what normal looks like — and alerts you the moment something changes.

$pip install pipedog

See it in action

zsh — pipedog demo
$ pipedog init orders_jan.csv orders_feb.csv --profile orders
Reading orders_jan.csv...
Reading orders_feb.csv...
Merging 2 files into one baseline...
─────────────────────────────────────
Schema snapshot saved to .pipedog/orders/schema.json
2 files merged into baseline
7 columns | 20 rows | 24 quality checks generated
$ pipedog scan orders_mar.csv --profile orders
Reading orders_mar.csv...
─────────────────────────────────────
✓ ALL CHECKS PASSED
10 rows | 7 columns | 24 passed | 0 warnings | 0 failed
$

Everything you need. Nothing you don't.

Pipedog is designed for data engineers and analysts who want fast feedback — not another platform to learn.

Zero Config

Auto-generates quality rules from your own data. No YAML, no contracts, no setup.

📄

Human-Readable Output

Reports written in plain English, not stack traces. Forward them to anyone on your team.

📁

Works with Flat Files

CSV, JSON, and Parquet out of the box. No database required.

🏷️

Named Profiles

Track multiple file types independently with --profile. One tool, many pipelines.

🔁

CI/CD Ready

Exits with code 1 on failure. Drops into GitHub Actions, Airflow, Prefect, or any pipeline.

🌐

HTML Reports

Every scan saves a self-contained HTML report you can open in a browser or email to your team.

How it works

Three commands. No infrastructure. Works from your terminal.

1

Init

pipedog init *.csv --profile sales

Run pipedog init on your historical files. Pipedog learns the baseline — schema, ranges, distributions, unique keys.

2

Scan

pipedog scan new_month.csv --profile sales

Next month, run pipedog scan on the new file. Pipedog compares it against the baseline across all 24 checks.

3

Act

exit 0 # or exit 1

Green? Ingest. Red? Fix the data before it reaches your database. HTML report saved automatically.

Auto-generated checks

Pipedog infers which checks to generate based on your data. No manual rule-writing required.

CheckWhen GeneratedSeverity
not_nullColumn had zero nulls at initerror
null_rateColumn had some nulls; threshold = baseline % + 10ppwarning
min_valueNumeric column; locks observed minimumerror
max_valueNumeric column; locks observed maximumerror
uniqueEvery value was distinct (key column detection)error
allowed_valuesString/boolean column with ≤ 50 distinct valueserror
std_dev_changeNumeric column; flags distribution shift > 50%warning
row_countEvery file; threshold = 80% of baseline row counterror

Ready to stop bad data at the door?

Install in seconds. No config files. No account required.

$pip install pipedog