Production RAG systems
Retrieval pipelines that scale to millions of documents — chunking, hybrid search, reranking, citation-by-default prompting.
92% accuracy · 100K+ docs
15+ years building production software · 3+ years shipping enterprise LLM systems for government. RAG, NL-to-SQL, document intelligence — the kind that runs 24/7, not the kind that demos well.
Cut document research time from 2 hours to 10 seconds.
Built an AI chatbot now handling 18K+ queries a month.
Made 200+ non-technical staff fluent in SQL — without SQL.
Shipped a vision pipeline saving 2,000 staff hours every month.
Reduced GPT-4 API costs by 38% while improving response quality.
Ask my AI anything
Where I've shipped


What I build with
Logos shown are the property of their respective owners and used here solely to identify past clients and employers. No endorsement or partnership is implied.
Shipped systems for · Trained by
SCAD
Statistics Centre Abu Dhabi

MoHRE
Ministry of Human Resources · UAE
Microsoft AI Developer Program
Trained · Aug–Sep 2025
15+ Years
Shipping production software
/ about
I've spent 15+ years shipping software — from leasing platforms in Lahore to government services in Abu Dhabi to LLM systems running 24/7 today.
The pattern that kept showing up: the hard part is never the model. It's the chunking, the retrieval, the prompts, the evals, the cost. The boring stuff between the demo and the system.
I now spend my days on production LLM architecture — RAG pipelines over 100K+ documents, NL-to-SQL across eight databases, vision pipelines that save thousands of staff hours a month. The work is technical, opinionated, and end-to-end.
Currently · Abu Dhabi, United Arab Emirates. Available from June 2026 for senior IC, principal, or hands-on tech-lead roles.
evidence over claims ✓
/ operating principles
Production beats prototype.
A demo is a hypothesis. Shipped software in front of real users is the only reliable signal.
Architecture is a forcing function.
Choose the system shape that makes the right thing easy and the wrong thing visible.
Measure or it didn't happen.
Latency, accuracy, cost, satisfaction — define them, instrument them, then iterate on numbers.
AI is plumbing, not magic.
Retrieval, evaluation, guardrails, observability — the boring layers are what make the magic work.
“The bottleneck is never the model — it's chunking, retrieval, and prompts.”
/ systems in production
Not marketing diagrams — these are the actual shapes of systems running today at government scale. Hover any node for what it really does.
rag.pipeline
live“A demo is a hypothesis. Production is the only evidence.”
/ services
Retrieval pipelines that scale to millions of documents — chunking, hybrid search, reranking, citation-by-default prompting.
92% accuracy · 100K+ docs
NL-to-SQL, copilots, multi-turn agents. Schema injection, execution-aware repair loops, multilingual (Arabic + English).
85%+ SQL accuracy · 200+ users
Computer-vision pipelines with confidence-gated human review. Type-aware routing to the cheapest extractor that works.
2,000+ hrs/month saved
Gold-set evaluation, regression tracking, cost-aware model routing. The boring stuff that turns demos into systems.
Evals are the new unit tests
Few-shot, chain-of-thought, function calling. Caching, model routing, token budgeting — without hurting quality.
38% GPT-4 cost cut
Mentor mid-level engineers, run AI architecture reviews, write the docs everyone actually reads.
15+ yrs, 5+ teams led
/ portfolio
Four flagship systems. Each one was an experiment for someone, somewhere. Now they all run 24/7 in front of real users.
2025 · SCAD
FeaturedArchitected end-to-end RAG pipeline using Azure OpenAI and Cognitive Search with hybrid retrieval and re-ranking.
2023 – Present · SCAD
FeaturedBuilt natural language to SQL query system with conversational interface and automatic error correction.
2025 · SCAD
FeaturedBuilt intelligent document processing pipeline using GPT-4 Vision and Azure Form Recognizer.
/ stack
Green chips link to a project or article where I actually used the thing — proof, not just labels.
01
Building with frontier models in production.
02
Retrieval pipelines that scale to millions of docs.
03
Orchestrating agents, tools, and memory.
04
Secure, event-driven, cloud-native platforms.
05
End-to-end systems from API to UI.
06
Natural dialogue that resolves real problems.
/ experience
Nov 2022 – Present
Abu Dhabi, UAE
@ Statistics Centre — Abu Dhabi (SCAD)
July 2018 – Nov 2022
UAE
@ Ministry of Human Resources & Emiratisation (MoHRE)
June 2015 – June 2018
Lahore, Pakistan
@ TRG Tech
Dec 2012 – June 2015
Lahore, Pakistan
@ NETSOL Technologies
/ training & programs
Microsoft Official Course completions and self-paced learning paths. I haven’t sat the AI-102 / AZ-305 exams yet — the knowledge is applied daily in production at SCAD.
Training completed · exam not yet attempted
Worked through the full AI-102 curriculum — Azure OpenAI, Cognitive Services, knowledge mining, conversational AI — applied directly in production at SCAD.
Microsoft Learn
Training completed · exam not yet attempted
Self-paced study of architecture design patterns for Azure, identity, governance, data platform, and business-continuity design.
Microsoft Learn
/ live demo
Ask anything about my CV and watch the full retrieval pipeline run in real time — embed, retrieve, rerank, generate with citations. All free-tier infrastructure.
Embed
Your question becomes a 3072-dim vector with Gemini Embedding 001.
Retrieve + Rerank
Pinecone fetches top-10 candidates; Cohere rerank-v3.5 reorders them.
Cited generation
Groq Llama 3.3 70B synthesises a cited answer from the top-5 chunks in ~1 s.
16 chunks · Pinecone serverless · Gemini · Cohere · Groq · $0/month
Live RAG Demo
Pinecone · Gemini · Cohere · Groq Llama 3.3
I shipped an MCP server that exposes my profile, projects, skills and CV corpus as live tools. Any MCP-compatible client (Claude Desktop, Cursor, etc.) can query, search and check fit programmatically.
get_profile
Bio, role, location
list_projects
All shipped projects
search_cv
Semantic search the corpus
“Measure or it didn't happen. Latency, accuracy, cost — define them first.”
/ testimonials
“Mazhar architected our RAG document intelligence system from scratch — 100K+ government documents, retrievable in seconds. His ability to translate a vague business problem into a precise, production-ready AI architecture is rare. He delivered on time, measured everything, and the system has run without issues for over a year.”
Senior Director
Digital Transformation · Statistics Centre Abu Dhabi (SCAD)
Direct stakeholder
“He built the NL-to-SQL analytics platform that changed how 200+ of our non-technical staff work. What impressed me most wasn't the technology — it was his insistence on measuring accuracy before and after every change. He doesn't ship until the numbers say it's ready.”
Head of Analytics
Data & Analytics · SCAD
Internal client
“Mazhar led the backend architecture for Tasheel — one of the highest-traffic government service platforms in the UAE. He brought the kind of calm, systematic thinking that made a complex distributed system feel simple. His code reviews alone upskilled the entire team.”
Engineering Manager
Platform Engineering · MoHRE UAE
Direct manager
/ writing
Real architectures, real numbers, hard-won lessons. No fluff.
Chunking strategies, re-ranking, hybrid search, eval frameworks — the six decisions that separate a demo from a system that runs 24/7 in front of thousands of users.
Few-shot, chain-of-thought, function calling, and system-prompt hygiene — with real examples from the systems I've shipped and the cost/quality trade-offs of each.
How schema injection, intent classification, execution-aware repair loops, and a good evaluation harness got our conversational analytics platform to production-grade accuracy.
The prompt engineering framework we built at SCAD — caching, token budgeting, model routing, and eval-driven iteration — that saved tens of thousands annually.
/ newsletter
~1 email a month. Real lessons from production AI. No spam, no recycled LinkedIn content.
/ contact me
The fastest path is a 15-minute Calendly call — no slide deck, no prep, just questions about your problem and whether I'm the right person for it.