Catch data quality issues
before they reach production
Point it at a CSV, Parquet, or JSON file. Pipedog learns what normal looks like — and alerts you the moment something changes.
See it in action
Everything you need. Nothing you don't.
Pipedog is designed for data engineers and analysts who want fast feedback — not another platform to learn.
Zero Config
Auto-generates quality rules from your own data. No YAML, no contracts, no setup.
Human-Readable Output
Reports written in plain English, not stack traces. Forward them to anyone on your team.
Works with Flat Files
CSV, JSON, and Parquet out of the box. No database required.
Named Profiles
Track multiple file types independently with --profile. One tool, many pipelines.
CI/CD Ready
Exits with code 1 on failure. Drops into GitHub Actions, Airflow, Prefect, or any pipeline.
HTML Reports
Every scan saves a self-contained HTML report you can open in a browser or email to your team.
How it works
Three commands. No infrastructure. Works from your terminal.
Init
pipedog init *.csv --profile salesRun pipedog init on your historical files. Pipedog learns the baseline — schema, ranges, distributions, unique keys.
Scan
pipedog scan new_month.csv --profile salesNext month, run pipedog scan on the new file. Pipedog compares it against the baseline across all 24 checks.
Act
exit 0 # or exit 1Green? Ingest. Red? Fix the data before it reaches your database. HTML report saved automatically.
Auto-generated checks
Pipedog infers which checks to generate based on your data. No manual rule-writing required.
| Check | When Generated | Severity |
|---|---|---|
not_null | Column had zero nulls at init | error |
null_rate | Column had some nulls; threshold = baseline % + 10pp | warning |
min_value | Numeric column; locks observed minimum | error |
max_value | Numeric column; locks observed maximum | error |
unique | Every value was distinct (key column detection) | error |
allowed_values | String/boolean column with ≤ 50 distinct values | error |
std_dev_change | Numeric column; flags distribution shift > 50% | warning |
row_count | Every file; threshold = 80% of baseline row count | error |
Ready to stop bad data at the door?
Install in seconds. No config files. No account required.