012025SCAD

Enterprise RAG Document Intelligence System

How we replaced 2 hours of analyst time with 10 seconds of GPT-4 — for 100,000+ government documents.

Architecture

rag.pipeline

● live

Before

Analysts spent 2-3 hours per query digging through SharePoint folders, PDFs, and legacy reports. Knowledge that existed in the organisation was effectively invisible.

After

Every analyst now gets cited answers in under 10 seconds. The system handles 5,000+ queries a month at 92% accuracy, has been running 24/7 for over a year, and pays for itself many times over each week.

Challenge

100K+ government documents scattered across legacy systems with no unified search.

Approach

Architected end-to-end RAG pipeline using Azure OpenAI and Cognitive Search with hybrid retrieval and re-ranking.

How it was built

1
Discovery
Weeks 1–2
Interviewed 12 analysts across 4 departments. Mapped how they actually search — turns out 60% of queries were semantic ("what's our methodology for X") not keyword. This single insight killed the SharePoint-search-better plan.
2
Prototype
Weeks 3–6
Three prototypes, three failures. v1 used 1024-token chunks (vague answers). v2 used pure vector search (missed exact terms). v3 finally combined hybrid retrieval + re-ranking and crossed the 85% accuracy threshold needed to ship.
3
Evaluation harness
Weeks 7–8
Built a 200-question gold-standard test set with domain experts. Every code change now runs the eval before merging. This slowed development for 2 weeks then accelerated everything for the next 12 months.
4
Production hardening
Weeks 9–12
Citation post-processing, Arabic support, document permissions, rate limiting, observability dashboards. The unsexy 80% that separates demo from product.
5
Launch & iterate
Month 4 — Present
Soft launch to 20 analysts, then 200, then org-wide. Weekly review of failure cases. Cost dropped 65% over 6 months through prompt + context optimisation.

Key architecture decisions

Hybrid retrieval (BM25 + Vector) over pure semantic

Why · Pure vector search missed exact terms (numbers, acronyms, proper nouns) that analysts cared about. RRF fusion gave us +14 points on NDCG@5.

Semantic chunking over fixed-token chunking

Why · Splitting on section boundaries instead of token counts improved accuracy by ~20% before we even touched the model.

Cross-encoder re-ranking

Why · Bi-encoder similarity is fast but imprecise. Reranking top-50 candidates with Cohere rerank-v3 reduced GPT-4 context window costs by 65% while improving precision.

Citation-by-default in the prompt

Why · Users don't trust answers they can't verify. Structured [SOURCE:doc_id,page_n] tagging turned the system from "helpful" to "trustworthy."

Impact

Reduced information retrieval time from 2–3 hours to under 10 seconds
Achieved 92% accuracy on complex multi-document queries
Processing 5K+ queries monthly with 87% user satisfaction
Cut document research costs by 65% through automation

10s

time

92%

accuracy

-65%

cost

5K+/mo

queries

What I'd tell someone building this

01 · Evaluation infrastructure pays for itself within a month. Build it first, not last.
02 · Users prefer accurate uncertainty over confident hallucination. Teach the model to say "I don't know."
03 · Chunking strategy and prompt design move the needle 10× more than picking the latest model.
04 · Government Arabic-English content needs first-class language handling — not an afterthought.

“Mazhar's RAG system gave us a year of analyst productivity back in three months. The numbers speak for themselves — and the architecture is clean enough that we extended it to two more departments without his help.”

— Senior Director, Digital Transformation · SCAD

Tech stack

GPT-4Azure Cognitive SearchLangChainPineconeAzure Functions.NET CoreAngular

Ask anything about Enterprise RAG Document Intelligence System

AI scoped to this project · Llama 3.3 70B

Discovery

Prototype

Evaluation harness

Production hardening

Launch & iterate

Hybrid retrieval (BM25 + Vector) over pure semantic

Semantic chunking over fixed-token chunking

Cross-encoder re-ranking

Citation-by-default in the prompt