Insights · Production AI

Architecture Insights

Deep technical pieces from real production experience. No content marketing. No sponsored posts.
RAG · Production
Why 95% of RAG Demos Fail in Production
The seven failure modes that kill RAG systems between the notebook demo and production — with the exact fixes.
12 Jun 2026 · 9 min read
Read →
Protocols · MCP · A2A
MCP vs A2A: Which Protocol Should Your Enterprise Bet On?
MCP and A2A solve different problems — one connects agents to tools, the other connects agents to agents. A practical decision guide for enterprise architects, from someone running both in production.
10 Jun 2026 · 7 min read
Read →
Agentic AI · Architecture
AI Agents vs Agentic AI vs Workflows: What Your Enterprise Actually Needs
Most "agentic AI" projects fail because they used an agent where a workflow belonged. A field guide to choosing between workflows, single agents, and multi-agent systems — with the autonomy ladder I use with clients.
03 Jun 2026 · 6 min read
Read →
Security · AI Agents
Prompt Injection: Your AI Agent's Biggest Production Risk
The moment your agent reads untrusted content and holds real tools, every document becomes a potential attacker. The layered defense stack we deploy for financial-services agents — because there is no single fix.
27 May 2026 · 7 min read
Read →
RAG · Fine-Tuning · Architecture
RAG vs Fine-Tuning vs Long Context: The 2026 Decision Framework
Knowledge, behaviour, or workspace? Most teams pick the wrong technique because they never separated the three. The decision framework I use on enterprise engagements, with the cost curves that decide it.
20 May 2026 · 7 min read
Read →
Agentic AI · Orchestration
Multi-Agent Orchestration Patterns That Survive Production
In 2026, orchestration — not model choice — is where enterprise agentic value is won or lost. The supervisor pattern, parallel fan-out, and the failure-recovery machinery that separates systems from demos.
13 May 2026 · 7 min read
Read →
LLMOps · Cost
The Real Cost of LLMs in Production: Token Economics for Enterprises
The demo cost ₹40 per query and nobody noticed. At 50,000 queries a day, the CFO noticed. The cost-optimization stack — routing, caching, context discipline — that cuts AI bills 60–90% without losing accuracy.
06 May 2026 · 7 min read
Read →
Evals · LLMOps
Evaluating AI Agents: Beyond Vibes and Demos
"We tried ten prompts and it looked good" has launched a thousand failing agent systems. How to build the evaluation harness — golden sets, trajectory checks, LLM judges, and gates — that lets you ship agents you can defend.
29 Apr 2026 · 7 min read
Read →
Agentic AI · Memory
Agent Memory: The Hardest Unsolved Problem in Agentic AI
Persistent memory is what separates an agent from a goldfish with tools — and it's where most agentic architectures quietly fall apart. Working, episodic, and semantic memory, and the governance problem nobody prices in.
22 Apr 2026 · 7 min read
Read →
Azure AI Foundry · Review
Azure AI Foundry: The Honest Enterprise Review
After deploying a bank compliance system on Azure AI Foundry: what genuinely works, what's marketing, and the gotchas the documentation doesn't mention. An architect's field review.
15 Apr 2026 · 8 min read
Read →
Fine-Tuning · Case Study
LoRA Fine-Tuning for Finance: How We Gained 31 Points Over the Base Model
How we fine-tuned a model on 12K RBI compliance examples and beat the base model by 31 percentage points — and why the win came from data preparation, not the training run.
08 Apr 2026 · 8 min read
Read →
No spam. Unsubscribe anytime. New Tuesdays.