All work
032025SCAD

Document Processing & Vision AI Pipeline

1,000+ documents a day. PDFs, scans, handwriting, Arabic, English, tables. All structured in 30 seconds.

Architecture

vision.pipeline

● live
INTAKEROUTEEXTRACT · MERGEVALIDATE · DELIVERINBOX · 1K/DAYPDF420JPG180PNG150FORM210SCAN40AR + EN contenthandwriting · tables14 document typesClassifierroute by typeForm Recognizerstructured formsGPT-4 Visionunstructured · ARTesseract · OCRlegacy scansMerge + Enrichper-field confidenceSTRUCTURED · COSMOS DB{ type, fields, conf } emp_id: "12345" (0.99) date: "2024-01" (0.94) notes: ar+en (0.71)↳ if conf < 0.85 → human review queue80% AUTOMATED · 94% EXTRACTION · 2K+ HRS SAVED/MO · 30s VS 15min · 14 DOC TYPES

Before

Manual data entry consumed 2,000+ staff hours a month. Backlogs grew. Errors were silent until they showed up in published statistics weeks later.

After

80% of incoming documents are now classified, extracted, validated, and routed automatically. Human reviewers focus on the 20% the system flags as low-confidence.

Challenge

Manual processing of 1000+ daily documents (PDFs, scanned images, forms).

Approach

Built intelligent document processing pipeline using GPT-4 Vision and Azure Form Recognizer.

How it was built

  1. 1

    Document taxonomy

    Weeks 1–2

    Catalogued the 14 distinct document types coming through the queue. Defined the structured schema each type should output. Vision AI without this becomes a guessing game.

  2. 2

    Azure Document Intelligence baseline

    Weeks 3–4

    Layout + field extraction got us to 70% accuracy on structured forms. Tables and handwritten Arabic remained painful.

  3. 3

    GPT-4 Vision for hard cases

    Weeks 5–7

    Routed handwritten + mixed-language documents to GPT-4 Vision with structured-output prompting. Lifted accuracy on hard cases from 50% to 89%.

  4. 4

    Confidence-aware human-in-loop

    Weeks 8–9

    Per-field confidence scores → low-confidence fields highlighted in a reviewer UI. Reviewers correct in seconds instead of re-keying entire documents.

  5. 5

    Cosmos DB + downstream integration

    Weeks 10–12

    Structured output flows into Cosmos DB → triggers downstream analytics pipelines → appears in dashboards. End-to-end traceability from scan to chart.

Key architecture decisions

Azure Document Intelligence + GPT-4 Vision (not just one)

Why · ADI handles structured forms cheaply. GPT-4V handles unstructured chaos. Routing by document type uses each model where it's strongest.

Per-field confidence scores

Why · Without per-field confidence, the only options are "trust everything" or "review everything." Confidence-gated review is what makes 80% automation safe.

Strict JSON schema output

Why · Downstream systems break on shape changes. Schema-enforced output prevents "silent" extraction errors from corrupting databases.

Impact

  • Automated 80% of document classification and data extraction tasks
  • Reduced processing time from 15 minutes to 30 seconds per document
  • 94% accuracy on structured form extraction
  • Saved 2000+ staff hours monthly
80%
automation
30s
time
94%
accuracy
2K+/mo hrs
saved

What I'd tell someone building this

  • 01 · Two models with clear routing beat one expensive model trying to do everything.
  • 02 · Confidence is the unsung hero of human-in-loop AI systems.
  • 03 · Define your output schema before you pick your model.
  • 04 · Arabic handwriting is still hard. Reviewer UX matters more than chasing the last 5% of accuracy.
The pipeline saves the team two thousand hours every single month. But the bigger win is the confidence dashboard — we can now point at any number in our reports and trace it back to the source document.
Operations Lead, Census Programme · SCAD

Tech stack

GPT-4 VisionAzure Form RecognizerAzure FunctionsBlob StorageCosmos DB

Ask anything about Document Processing & Vision AI Pipeline

AI scoped to this project · Llama 3.3 70B