All work
102024SCAD

Statistical Data-Quality Anomaly Detector

Challenge

Monthly economic indicator submissions occasionally contained unit-of-measure errors that weren't caught until publication.

Approach

Hybrid stats + LLM pipeline: classical outlier detection flags suspect rows, then GPT-4 reasons about whether they're real changes vs. likely data-entry mistakes.

Impact

  • Caught 23 publication-blocking issues across 9 months pre-release
  • Reduced false positives by 60% vs. the prior threshold-only system
  • Cut publication delays linked to data quality from 4/yr to 0
23 issues
caught
-60%
fp
0
delays

Tech stack

PythonGPT-4scikit-learnAzure Functions

Ask anything about Statistical Data-Quality Anomaly Detector

AI scoped to this project · Llama 3.3 70B