Available · June 2026

I architect AI systems
that ship.

15+ years building production software · 3+ years shipping enterprise LLM systems for government. RAG, NL-to-SQL, document intelligence — the kind that runs 24/7, not the kind that demos well.

Cut document research time from 2 hours to 10 seconds.

Built an AI chatbot now handling 18K+ queries a month.

Made 200+ non-technical staff fluent in SQL — without SQL.

Shipped a vision pipeline saving 2,000 staff hours every month.

Reduced GPT-4 API costs by 38% while improving response quality.

Ask my AI anything

Mazhar HayatAI Solutions Architect  //  Senior Engineer
See the workEmail meAbu Dhabi, United Arab Emirates
Mazhar Hayat — portrait
Production-grade AI · since 2023
Agentic AIMulti-Agent OrchestrationAutoGenLangGraphCrewAITool UseFunction CallingReActReflexionMCPA2A ProtocolsRAGGraphRAGHybrid SearchRe-rankingSemantic CachingEvals & GuardrailsPrompt EngineeringFine-tuningLoRA / QLoRAVision AINL-to-SQLGPT-4oGPT-4.1Claude 3.5 SonnetGemini 1.5 ProLlama 3.1MistralGroqAzure OpenAILangChainLlamaIndexSemantic KernelPineconeWeaviateFAISSAzure AI Search.NET Core 8Angular 17KubernetesDocker15+ Years100K+ Docs Indexed18K+ Monthly Queries95% Time ReductionAgentic AIMulti-Agent OrchestrationAutoGenLangGraphCrewAITool UseFunction CallingReActReflexionMCPA2A ProtocolsRAGGraphRAGHybrid SearchRe-rankingSemantic CachingEvals & GuardrailsPrompt EngineeringFine-tuningLoRA / QLoRAVision AINL-to-SQLGPT-4oGPT-4.1Claude 3.5 SonnetGemini 1.5 ProLlama 3.1MistralGroqAzure OpenAILangChainLlamaIndexSemantic KernelPineconeWeaviateFAISSAzure AI Search.NET Core 8Angular 17KubernetesDocker15+ Years100K+ Docs Indexed18K+ Monthly Queries95% Time Reduction

Where I've shipped

  • Statistics Centre Abu Dhabi
    SCADStatistics Centre Abu Dhabi
  • UAE Ministry of Human Resources & Emiratisation
    MoHREUAE Government
  • NETSOL Technologies
    NETSOLAsset Finance Platform
  • TRG Holdings
    TRG TechSentiment Engine

What I build with

  • Azure OpenAI
  • Pinecone
  • Anthropic
  • Cohere
  • Groq
  • LangChain

Logos shown are the property of their respective owners and used here solely to identify past clients and employers. No endorsement or partnership is implied.

Shipped systems for · Trained by

Statistics Centre Abu Dhabi

SCAD

Statistics Centre Abu Dhabi

UAE Ministry of Human Resources & Emiratisation

MoHRE

Ministry of Human Resources · UAE

Microsoft AI Developer Program

Trained · Aug–Sep 2025

15+ Years

Shipping production software

/ about

Engineer first. Architect second.

I've spent 15+ years shipping software — from leasing platforms in Lahore to government services in Abu Dhabi to LLM systems running 24/7 today.

The pattern that kept showing up: the hard part is never the model. It's the chunking, the retrieval, the prompts, the evals, the cost. The boring stuff between the demo and the system.

I now spend my days on production LLM architecture — RAG pipelines over 100K+ documents, NL-to-SQL across eight databases, vision pipelines that save thousands of staff hours a month. The work is technical, opinionated, and end-to-end.

Currently · Abu Dhabi, United Arab Emirates. Available from June 2026 for senior IC, principal, or hands-on tech-lead roles.

0+
Years Shipping Software
0+
Production AI Systems
0K+
Documents Processed
0K+
Monthly AI Queries
0%
Research Time Saved
0K+ hrs
Staff Hours Saved / Mo

evidence over claims ✓

/ operating principles

The four rules I actually work by.

  1. 01

    Production beats prototype.

    A demo is a hypothesis. Shipped software in front of real users is the only reliable signal.

    01
  2. 02

    Architecture is a forcing function.

    Choose the system shape that makes the right thing easy and the wrong thing visible.

    02
  3. 03

    Measure or it didn't happen.

    Latency, accuracy, cost, satisfaction — define them, instrument them, then iterate on numbers.

    03
  4. 04

    AI is plumbing, not magic.

    Retrieval, evaluation, guardrails, observability — the boring layers are what make the magic work.

    04
Lesson · 02

The bottleneck is never the model — it's chunking, retrieval, and prompts.

/ systems in production

Three pipelines, drawn exactly how they run.

Not marketing diagrams — these are the actual shapes of systems running today at government scale. Hover any node for what it really does.

rag.pipeline

live
CORPUSINDEXRETRIEVEGENERATE · CITE100K+ DOCSdoc_001.pdfdoc_002.pdfdoc_003.pdfdoc_004.pdfdoc_005.pdfdoc_006.pdfAR + ENpolicy · stats36 chunks liveChunksemantic · 300 tokEmbedgemini · 3072dPINECONEserverless · freeUSER QUERY"How did you cutcosts by 65%?"Hybrid RetrieverBM25 + vector · RRFRe-rankcohere · cross-encTOP-5 CHUNKS[1] case-rag-decisions 0.92[2] stack-cost 0.87[3] case-rag-timeline 0.81Groq · Llama 3.3 70Bstreaming · <800msANSWER + CITATIONSWe compressed context [1],filtered chunks via rerank[2], and used few-shot [3]to cut tokens 65%.↻ eval harness · 200-question gold set runs on every change92% ACCURACY · 65% COST CUT · 5K+ QUERIES/MO · <2s END-TO-END
Lesson · 01

A demo is a hypothesis. Production is the only evidence.

/ services

What I do, and how I do it.

See it in action

Production RAG systems

Retrieval pipelines that scale to millions of documents — chunking, hybrid search, reranking, citation-by-default prompting.

92% accuracy · 100K+ docs

Conversational AI

NL-to-SQL, copilots, multi-turn agents. Schema injection, execution-aware repair loops, multilingual (Arabic + English).

85%+ SQL accuracy · 200+ users

Document intelligence

Computer-vision pipelines with confidence-gated human review. Type-aware routing to the cheapest extractor that works.

2,000+ hrs/month saved

LLM evaluation harnesses

Gold-set evaluation, regression tracking, cost-aware model routing. The boring stuff that turns demos into systems.

Evals are the new unit tests

Prompt + cost engineering

Few-shot, chain-of-thought, function calling. Caching, model routing, token budgeting — without hurting quality.

38% GPT-4 cost cut

Technical leadership

Mentor mid-level engineers, run AI architecture reviews, write the docs everyone actually reads.

15+ yrs, 5+ teams led

/ portfolio

Selected work, production grade.

Four flagship systems. Each one was an experiment for someone, somewhere. Now they all run 24/7 in front of real users.

See all projects

2025 · SCAD

Featured

Enterprise RAG Document Intelligence System

Architected end-to-end RAG pipeline using Azure OpenAI and Cognitive Search with hybrid retrieval and re-ranking.

  • Reduced information retrieval time from 2–3 hours to under 10 seconds
  • Achieved 92% accuracy on complex multi-document queries
  • Processing 5K+ queries monthly with 87% user satisfaction
GPT-4Azure Cognitive SearchLangChainPineconeAzure Functions
Full case study

2023 – Present · SCAD

Featured

Intelligent Conversational Analytics Platform

Built natural language to SQL query system with conversational interface and automatic error correction.

  • Enabled 200+ non-technical staff to query databases using plain English
  • Handles 15K+ queries monthly across 8 different databases
  • Reduced analytics request backlog by 70%
GPT-4Semantic KernelAzure OpenAI.NET Core Web APIAngular
Full case study

2025 · SCAD

Featured

Document Processing & Vision AI Pipeline

Built intelligent document processing pipeline using GPT-4 Vision and Azure Form Recognizer.

  • Automated 80% of document classification and data extraction tasks
  • Reduced processing time from 15 minutes to 30 seconds per document
  • 94% accuracy on structured form extraction
GPT-4 VisionAzure Form RecognizerAzure FunctionsBlob StorageCosmos DB
Full case study

/ stack

The toolkit behind the systems.

Green chips link to a project or article where I actually used the thing — proof, not just labels.

01

Generative AI & LLMs

Building with frontier models in production.

02

RAG & Vector Search

Retrieval pipelines that scale to millions of docs.

Multi-stage RetrievalHybrid SearchRe-rankingPineconeFAISSChromaAzure Cognitive SearchWeaviateOpenAI EmbeddingsSentence Transformers

03

AI Frameworks

Orchestrating agents, tools, and memory.

LangChainSemantic KernelLlamaIndexHaystackAzure Bot FrameworkMLflowAzure ML

04

Cloud & Architecture

Secure, event-driven, cloud-native platforms.

05

Full Stack

End-to-end systems from API to UI.

.NET Core 8ASP.NET Web APIAngular 17ReactTypeScriptNode.jsSQL ServerRedis CacheCI/CD Pipelines

06

Conversational AI & NLP

Natural dialogue that resolves real problems.

Multi-turn DialogueIntent ClassificationEntity ExtractionSentiment AnalysisDocument UnderstandingSummarizationQ&A Systems

/ experience

git log --author="mazhar"

See full timeline

Nov 2022 – Present

Abu Dhabi, UAE

Senior System Analyst / AI Platforms

@ Statistics Centre — Abu Dhabi (SCAD)

  • Architected enterprise RAG system processing 100K+ documents using GPT-4 and Azure Cognitive Search, reducing research time by 95%
  • Deployed conversational AI chatbot handling 18K+ monthly queries with 90% first-contact resolution, cutting support costs by 43%
  • Built natural language SQL query interface enabling non-technical users to access 8 databases, processing 15K+ queries monthly
  • Designed prompt engineering framework reducing GPT-4 API costs by 38% while improving response quality by 15%
  • Led migration of 8 legacy monolithic applications to AI-enhanced microservices
GPT-4Azure OpenAILangChainSemantic KernelPinecone.NET Core 8Angular 17Kubernetes

July 2018 – Nov 2022

UAE

Senior Full Stack Engineer / Team Lead

@ Ministry of Human Resources & Emiratisation (MoHRE)

  • Led development of Tasheel Systems — 50+ labor and HR services applications serving 2M+ users annually
  • Modernized legacy codebase to microservices architecture, improving performance by 45%
  • Designed RESTful APIs consumed by 30+ internal and external systems with OAuth 2.0
  • Implemented Redis caching strategy reducing database load by 55%
  • Led team of 6 developers using Agile/Scrum with 95%+ sprint completion rate
.NET Core 5/6Angular 12-14ReactDockerKubernetesAzure DevOpsRedisAWS

June 2015 – June 2018

Lahore, Pakistan

Senior Software Engineer / Technical Lead

@ TRG Tech

  • Led cross-functional team of 8 developers (Full Stack, iOS, Android)
  • Built real-time sentiment analysis engine processing 100K+ social media posts daily
  • Developed social media monitoring platform integrating Twitter, Facebook, Instagram APIs
  • Architected data pipelines processing 1M+ records daily for business intelligence
.NET Framework 4.6AngularNode.jsSQL ServerSocial Media APIsSentiment Analysis

Dec 2012 – June 2015

Lahore, Pakistan

Software Developer

@ NETSOL Technologies

  • Maintained and enhanced large-scale financial leasing suite for international clients (FIAT, CNH Industrial)
  • Delivered 20+ features for enterprise financial management system
  • Reduced bug count by 35% through code refactoring and unit testing
.NET Framework 4.5ASP.NET MVCAngularJSSQL ServerCrystal Reports

/ training & programs

Trained by Microsoft. Applied in production.

Microsoft Official Course completions and self-paced learning paths. I haven’t sat the AI-102 / AZ-305 exams yet — the knowledge is applied daily in production at SCAD.

Trained

Azure AI Engineer track (AI-102)

Training completed · exam not yet attempted

Worked through the full AI-102 curriculum — Azure OpenAI, Cognitive Services, knowledge mining, conversational AI — applied directly in production at SCAD.

Microsoft Learn

Trained

Azure Solutions Architect track (AZ-305)

Training completed · exam not yet attempted

Self-paced study of architecture design patterns for Azure, identity, governance, data platform, and business-continuity design.

Microsoft Learn

/ live demo

This site runs RAG on itself.

Ask anything about my CV and watch the full retrieval pipeline run in real time — embed, retrieve, rerank, generate with citations. All free-tier infrastructure.

01

Embed

Your question becomes a 3072-dim vector with Gemini Embedding 001.

02

Retrieve + Rerank

Pinecone fetches top-10 candidates; Cohere rerank-v3.5 reorders them.

03

Cited generation

Groq Llama 3.3 70B synthesises a cited answer from the top-5 chunks in ~1 s.

16 chunks · Pinecone serverless · Gemini · Cohere · Groq · $0/month

Live RAG Demo

Pinecone · Gemini · Cohere · Groq Llama 3.3

Live
new · Model Context Protocol

Talk to my portfolio from Claude.

I shipped an MCP server that exposes my profile, projects, skills and CV corpus as live tools. Any MCP-compatible client (Claude Desktop, Cursor, etc.) can query, search and check fit programmatically.

View MCP setup

get_profile

Bio, role, location

list_projects

All shipped projects

search_cv

Semantic search the corpus

Lesson · 03

Measure or it didn't happen. Latency, accuracy, cost — define them first.

/ testimonials

What the people who shipped with me say.

Mazhar architected our RAG document intelligence system from scratch — 100K+ government documents, retrievable in seconds. His ability to translate a vague business problem into a precise, production-ready AI architecture is rare. He delivered on time, measured everything, and the system has run without issues for over a year.

Senior Director

Digital Transformation · Statistics Centre Abu Dhabi (SCAD)

Direct stakeholder

He built the NL-to-SQL analytics platform that changed how 200+ of our non-technical staff work. What impressed me most wasn't the technology — it was his insistence on measuring accuracy before and after every change. He doesn't ship until the numbers say it's ready.

Head of Analytics

Data & Analytics · SCAD

Internal client

Mazhar led the backend architecture for Tasheel — one of the highest-traffic government service platforms in the UAE. He brought the kind of calm, systematic thinking that made a complex distributed system feel simple. His code reviews alone upskilled the entire team.

Engineering Manager

Platform Engineering · MoHRE UAE

Direct manager

/ writing

Field notes from shipping AI.

Real architectures, real numbers, hard-won lessons. No fluff.

All articles
Nov 2025 12 min readcoming soon

RAG from prototype to production: what nobody tells you

Chunking strategies, re-ranking, hybrid search, eval frameworks — the six decisions that separate a demo from a system that runs 24/7 in front of thousands of users.

RAGLangChainAzure OpenAIProduction
Sep 2025 9 min readcoming soon

Prompt engineering patterns I actually use in production

Few-shot, chain-of-thought, function calling, and system-prompt hygiene — with real examples from the systems I've shipped and the cost/quality trade-offs of each.

GPT-4Prompt EngineeringAzure OpenAI
Jul 2025 10 min readcoming soon

Getting NL-to-SQL to 85%+ accuracy without fine-tuning

How schema injection, intent classification, execution-aware repair loops, and a good evaluation harness got our conversational analytics platform to production-grade accuracy.

Semantic KernelSQLGPT-4NLP
May 2025 7 min readcoming soon

Cutting GPT-4 API costs 38% without hurting quality

The prompt engineering framework we built at SCAD — caching, token budgeting, model routing, and eval-driven iteration — that saved tens of thousands annually.

Cost OptimisationGPT-4LLMOps

/ newsletter

New articles & shipped projects — straight to your inbox.

~1 email a month. Real lessons from production AI. No spam, no recycled LinkedIn content.

/ contact me

Come on let's talk.

The fastest path is a 15-minute Calendly call — no slide deck, no prep, just questions about your problem and whether I'm the right person for it.