/ writing
Field notes from shipping AI.
Real architectures, real numbers, hard-won lessons from production RAG, NL-to-SQL, and prompt-engineering work. No fluff.
- Apr 2026 12 min read
How We Cut Document Research From 2 Hours to 10 Seconds
The architecture, trade-offs, and hard lessons from building an enterprise RAG system at SCAD that now handles 5,000+ queries a month with 92% accuracy.
RAGAzure OpenAIPineconeEnterprise AIRead article - Nov 2025 12 min readcoming soon
RAG from prototype to production: what nobody tells you
Chunking strategies, re-ranking, hybrid search, eval frameworks — the six decisions that separate a demo from a system that runs 24/7 in front of thousands of users.
RAGLangChainAzure OpenAIProduction - Sep 2025 9 min readcoming soon
Prompt engineering patterns I actually use in production
Few-shot, chain-of-thought, function calling, and system-prompt hygiene — with real examples from the systems I've shipped and the cost/quality trade-offs of each.
GPT-4Prompt EngineeringAzure OpenAI - Jul 2025 10 min readcoming soon
Getting NL-to-SQL to 85%+ accuracy without fine-tuning
How schema injection, intent classification, execution-aware repair loops, and a good evaluation harness got our conversational analytics platform to production-grade accuracy.
Semantic KernelSQLGPT-4NLP - May 2025 7 min readcoming soon
Cutting GPT-4 API costs 38% without hurting quality
The prompt engineering framework we built at SCAD — caching, token budgeting, model routing, and eval-driven iteration — that saved tens of thousands annually.
Cost OptimisationGPT-4LLMOps