Portfolio

Transparency

This entire portfolio runs at $0 / month.

Live RAG against my CV. Streaming AI chat. Vector search. Cross-encoder reranking. Three LLM choices. Static deploys with previews. All of it on free tiers — and not in a hacky way. Here's the receipt.

$0

Monthly cost

8

Services composed

<2s

End-to-end RAG latency

The stack

ServiceTierCost / mo
Next.js 14Open source$0
Netlify

Hosting · CDN · build · preview deploys

100 GB bandwidth + 300 build min / mo — well within limits.

Starter (free)$0
GitHubFree$0
Pinecone

Vector database for RAG retrieval

1 serverless index, ~36 CV chunks, < 0.1% of free quota.

Starter (free)$0
Google Gemini

Embeddings (gemini-embedding-001, 3072 dim)

60 RPM, 1,500 RPD. Portfolio traffic is nowhere near.

Free tier$0
Groq

LLM inference (Llama 3.3 70B + 3.1 8B + Gemma 2)

Sub-second latency on Llama 3.3 70B. Ridiculous quality-to-cost ratio.

Free tier$0
Cohere

Cross-encoder reranking (rerank-v3.5)

1,000 calls / month free — enough for portfolio traffic for years.

Trial tier$0
Vercel Analytics + Speed InsightsHobby (free)$0
Total$0

Engineering principles behind it

Best-of-tier for each job

Groq for fast LLM. Cohere for rerank. Pinecone for vectors. Gemini for embeddings. Each one is independently the best free tier in its category — and they compose cleanly.

Graceful degradation built-in

If Cohere is missing, the API falls back to pure vector retrieval. If a key is missing, the feature degrades — the site never breaks.

Cost > 0 — but it's negligible

If traffic ever pushes one of these tiers, the next step up is $20–50 / month — still cheaper than a single hour of a human analyst.

Show, don't tell

This page exists because the cleanest way to demonstrate engineering judgement is to expose your trade-offs publicly. Transparency is a signal.

Want this kind of cost discipline applied to your team's AI infra? Let's talk →