All work
022023 – PresentSCAD

Intelligent Conversational Analytics Platform

Teaching SQL to 200+ people who can't write SQL — through plain English (and Arabic).

Architecture

nl_to_sql.flow

● live
INPUTREASONEXECUTEVALIDATE · REPAIR · DELIVERUSER QUESTION"Top 5 sectorsby employmentin 2024?"EN · AR · multilingualIntent + Schemasemantic-kernelSQL SynthesisGPT-4 · 80 few-shots8 DATABASES · READ-ONLYcensus_2024labor_forcetrade_statsdemographicsGENERATED SQLSELECT sector, SUM(emp) AS totalFROM census_2024WHERE year = 2024GROUP BY sectorORDER BY total DESC LIMIT 5;Validate + Repairexecution-awareRESULT SET5 rows · 47 ms→ chart + explanation↻ +13% accuracy from repair loop200+ NON-TECH USERS · 18K+ QUERIES/MO · 85% ACCURACY · ROW-LEVEL SECURITY

Before

Data analysts were a bottleneck. Every report request waited 3-5 days in their queue. Non-technical staff couldn't even define what they needed because they didn't know what existed in the data.

After

200+ staff now query 8 databases in plain language, getting answers in seconds. The analytics backlog dropped 70%. Analysts moved up the stack to harder problems.

Challenge

Non-technical users needed SQL database access without coding knowledge.

Approach

Built natural language to SQL query system with conversational interface and automatic error correction.

How it was built

  1. 1

    Schema audit

    Weeks 1–3

    Catalogued all 8 production databases — 240 tables, 3,200 columns, many cryptically named in mixed Arabic-English transliterations. Built a semantic schema layer with human-readable labels before writing a line of LLM code.

  2. 2

    Few-shot SQL generation

    Weeks 4–6

    Curated 80 question→SQL examples spanning the most common query patterns. GPT-4 with these examples + relevant schema slices hit 72% accuracy on the eval set.

  3. 3

    Execution-aware repair

    Weeks 7–9

    Added a repair loop — when generated SQL throws an error, the error message goes back to the model with the original schema for a corrected attempt. Lifted accuracy to 85%.

  4. 4

    Safety + access control

    Weeks 10–11

    Read-only DB users per role, query whitelisting, row-level filters. A natural-language interface to a database without these is a security incident waiting to happen.

  5. 5

    Conversational UX

    Weeks 12–14

    Multi-turn refinement, result explanation, follow-up suggestions. The chat UI is what made non-technical users actually adopt it. The model was already good enough.

Key architecture decisions

Schema-aware context injection over fine-tuning

Why · Fine-tuning would lock us to one schema version. Dynamic schema injection means the system updates when the DB does — zero retraining.

Read-only DB user with row-level security

Why · Defence in depth. Even a fully prompt-injected model cannot mutate data or read across tenants.

Execution-aware repair loop

Why · Generated SQL fails for predictable reasons (typos, ambiguous joins). Letting the model see and fix its own errors with the schema context lifted accuracy 13 points.

Impact

  • Enabled 200+ non-technical staff to query databases using plain English
  • Handles 15K+ queries monthly across 8 different databases
  • Reduced analytics request backlog by 70%
  • 85% query accuracy with automatic error correction
200+
users
15K+/mo
queries
-70%
backlog
85%
accuracy

What I'd tell someone building this

  • 01 · Schema design is more important than prompt design. Bad column names break the model long before bad prompts do.
  • 02 · The repair loop is more powerful than picking a bigger model.
  • 03 · Users want explanations as much as answers. "Here's the SQL I ran" builds trust.
  • 04 · Arabic column data needs explicit transliteration handling — don't assume the model will guess right.
He didn't ship until the numbers said it was ready. The platform changed how 200 people work — and the team that maintains it after Mazhar's involvement hasn't had to call him once in eight months.
Head of Analytics · SCAD

Tech stack

GPT-4Semantic KernelAzure OpenAI.NET Core Web APIAngularSQL Server

Ask anything about Intelligent Conversational Analytics Platform

AI scoped to this project · Llama 3.3 70B