032025SCAD

Document Processing & Vision AI Pipeline

1,000+ documents a day. PDFs, scans, handwriting, Arabic, English, tables. All structured in 30 seconds.

Architecture

vision.pipeline

● live

Before

Manual data entry consumed 2,000+ staff hours a month. Backlogs grew. Errors were silent until they showed up in published statistics weeks later.

After

80% of incoming documents are now classified, extracted, validated, and routed automatically. Human reviewers focus on the 20% the system flags as low-confidence.

Challenge

Manual processing of 1000+ daily documents (PDFs, scanned images, forms).

Approach

Built intelligent document processing pipeline using GPT-4 Vision and Azure Form Recognizer.

How it was built

1
Document taxonomy
Weeks 1–2
Catalogued the 14 distinct document types coming through the queue. Defined the structured schema each type should output. Vision AI without this becomes a guessing game.
2
Azure Document Intelligence baseline
Weeks 3–4
Layout + field extraction got us to 70% accuracy on structured forms. Tables and handwritten Arabic remained painful.
3
GPT-4 Vision for hard cases
Weeks 5–7
Routed handwritten + mixed-language documents to GPT-4 Vision with structured-output prompting. Lifted accuracy on hard cases from 50% to 89%.
4
Confidence-aware human-in-loop
Weeks 8–9
Per-field confidence scores → low-confidence fields highlighted in a reviewer UI. Reviewers correct in seconds instead of re-keying entire documents.
5
Cosmos DB + downstream integration
Weeks 10–12
Structured output flows into Cosmos DB → triggers downstream analytics pipelines → appears in dashboards. End-to-end traceability from scan to chart.

Key architecture decisions

Azure Document Intelligence + GPT-4 Vision (not just one)

Why · ADI handles structured forms cheaply. GPT-4V handles unstructured chaos. Routing by document type uses each model where it's strongest.

Per-field confidence scores

Why · Without per-field confidence, the only options are "trust everything" or "review everything." Confidence-gated review is what makes 80% automation safe.

Strict JSON schema output

Why · Downstream systems break on shape changes. Schema-enforced output prevents "silent" extraction errors from corrupting databases.

Impact

Automated 80% of document classification and data extraction tasks
Reduced processing time from 15 minutes to 30 seconds per document
94% accuracy on structured form extraction
Saved 2000+ staff hours monthly

80%

automation

30s

time

94%

accuracy

2K+/mo hrs

saved

What I'd tell someone building this

01 · Two models with clear routing beat one expensive model trying to do everything.
02 · Confidence is the unsung hero of human-in-loop AI systems.
03 · Define your output schema before you pick your model.
04 · Arabic handwriting is still hard. Reviewer UX matters more than chasing the last 5% of accuracy.

“The pipeline saves the team two thousand hours every single month. But the bigger win is the confidence dashboard — we can now point at any number in our reports and trace it back to the source document.”

— Operations Lead, Census Programme · SCAD

Tech stack

GPT-4 VisionAzure Form RecognizerAzure FunctionsBlob StorageCosmos DB

Ask anything about Document Processing & Vision AI Pipeline

AI scoped to this project · Llama 3.3 70B

Document taxonomy

Azure Document Intelligence baseline

GPT-4 Vision for hard cases

Confidence-aware human-in-loop

Cosmos DB + downstream integration

Azure Document Intelligence + GPT-4 Vision (not just one)

Per-field confidence scores

Strict JSON schema output