All writing

/ writing

Field notes from shipping AI.

Real architectures, real numbers, hard-won lessons from production RAG, NL-to-SQL, and prompt-engineering work. No fluff.

  • Apr 2026 12 min read

    How We Cut Document Research From 2 Hours to 10 Seconds

    The architecture, trade-offs, and hard lessons from building an enterprise RAG system at SCAD that now handles 5,000+ queries a month with 92% accuracy.

    RAGAzure OpenAIPineconeEnterprise AI
    Read article
  • Nov 2025 12 min readcoming soon

    RAG from prototype to production: what nobody tells you

    Chunking strategies, re-ranking, hybrid search, eval frameworks — the six decisions that separate a demo from a system that runs 24/7 in front of thousands of users.

    RAGLangChainAzure OpenAIProduction
  • Sep 2025 9 min readcoming soon

    Prompt engineering patterns I actually use in production

    Few-shot, chain-of-thought, function calling, and system-prompt hygiene — with real examples from the systems I've shipped and the cost/quality trade-offs of each.

    GPT-4Prompt EngineeringAzure OpenAI
  • Jul 2025 10 min readcoming soon

    Getting NL-to-SQL to 85%+ accuracy without fine-tuning

    How schema injection, intent classification, execution-aware repair loops, and a good evaluation harness got our conversational analytics platform to production-grade accuracy.

    Semantic KernelSQLGPT-4NLP
  • May 2025 7 min readcoming soon

    Cutting GPT-4 API costs 38% without hurting quality

    The prompt engineering framework we built at SCAD — caching, token budgeting, model routing, and eval-driven iteration — that saved tens of thousands annually.

    Cost OptimisationGPT-4LLMOps