All work
102024SCAD
Statistical Data-Quality Anomaly Detector
Monthly economic indicator submissions occasionally contained unit-of-measure errors that weren't caught until publication.
Hybrid stats + LLM pipeline: classical outlier detection flags suspect rows, then GPT-4 reasons about whether they're real changes vs. likely data-entry mistakes.
- Caught 23 publication-blocking issues across 9 months pre-release
- Reduced false positives by 60% vs. the prior threshold-only system
- Cut publication delays linked to data quality from 4/yr to 0
23 issues
caught
-60%
fp
0
delays
PythonGPT-4scikit-learnAzure Functions
Ask anything about Statistical Data-Quality Anomaly Detector
AI scoped to this project · Llama 3.3 70B