This entire portfolio runs at $0 / month.
Live RAG against my CV. Streaming AI chat. Vector search. Cross-encoder reranking. Three LLM choices. Static deploys with previews. All of it on free tiers — and not in a hacky way. Here's the receipt.
Monthly cost
Services composed
End-to-end RAG latency
| Service | Role | Tier | Cost / mo |
|---|---|---|---|
| Next.js 14 | App framework (App Router, Server Components) | Open source | $0 |
| Netlify Hosting · CDN · build · preview deploys 100 GB bandwidth + 300 build min / mo — well within limits. | Hosting · CDN · build · preview deploys | Starter (free) | $0 |
| GitHub | Source control · CI trigger | Free | $0 |
| Pinecone Vector database for RAG retrieval 1 serverless index, ~36 CV chunks, < 0.1% of free quota. | Vector database for RAG retrieval | Starter (free) | $0 |
| Google Gemini Embeddings (gemini-embedding-001, 3072 dim) 60 RPM, 1,500 RPD. Portfolio traffic is nowhere near. | Embeddings (gemini-embedding-001, 3072 dim) | Free tier | $0 |
| Groq LLM inference (Llama 3.3 70B + 3.1 8B + Gemma 2) Sub-second latency on Llama 3.3 70B. Ridiculous quality-to-cost ratio. | LLM inference (Llama 3.3 70B + 3.1 8B + Gemma 2) | Free tier | $0 |
| Cohere Cross-encoder reranking (rerank-v3.5) 1,000 calls / month free — enough for portfolio traffic for years. | Cross-encoder reranking (rerank-v3.5) | Trial tier | $0 |
| Vercel Analytics + Speed Insights | RUM · Core Web Vitals | Hobby (free) | $0 |
| Total | $0 | ||
Best-of-tier for each job
Groq for fast LLM. Cohere for rerank. Pinecone for vectors. Gemini for embeddings. Each one is independently the best free tier in its category — and they compose cleanly.
Graceful degradation built-in
If Cohere is missing, the API falls back to pure vector retrieval. If a key is missing, the feature degrades — the site never breaks.
Cost > 0 — but it's negligible
If traffic ever pushes one of these tiers, the next step up is $20–50 / month — still cheaper than a single hour of a human analyst.
Show, don't tell
This page exists because the cleanest way to demonstrate engineering judgement is to expose your trade-offs publicly. Transparency is a signal.
Want this kind of cost discipline applied to your team's AI infra? Let's talk →