What I Took Away from Not Another AI Podcast

I was recently a guest on Not Another AI Podcast, hosted by Piyanka Jain and Shanti Greene, where we talked about token optimization in AI systems. Most of what I brought to that conversation came from my year at Nirvana Insurance, where I was part of a small champions team that took AI adoption from around 20 percent of the company to 100 percent. That journey ran through every stage you would expect. Subscription sprawl. MCPs connected to everything at once. Context windows filling up before you asked a single real question. And some expensive mistakes. One of those was the time I ran Opus on a thousand-page insurance document without understanding the settings and spent $80 in a single day (which sounds like a sale when you look at Opus 4.8 and Mythos prices). Leadership treated it as a learning moment rather than a firing offense, and that culture made it possible to actually figure out what works. We landed on a custom platform built on top of Claude, crowd-sourced skills for every business function, and a skill routing system built by engineering that got the whole company using AI the right way. The core of the discussion was the transition from loading full MCP servers to building specific custom functions, something Piyanka and Shanti articulated more cleanly than I had in my own head. Loading a full MCP server means loading every one of its tool definitions into the context window whether you need them or not. Swapping those out for targeted functions can cut context load by 90 to 98 percent, which is exactly what we started doing at Nirvana after noticing that sessions were beginning with 30 to 40 percent of the context already consumed and hallucinations on wide insurance datasets were getting worse. Piyanka walked through how her platform, Inola, handles this differently at the structural level. It separates planning from execution entirely. You develop the full plan and vet it with the user before anything runs, which drives down iteration loops and controls token spend before the cost is incurred, not after. Shanti added that model selection is its own optimization layer. Defaulting to Opus for everything is an expensive habit, and knowing when a smaller model is more than good enough is a skill worth building.

What I am taking back to Sodali is mostly about measurement. Shanti made a point that stuck. Skills need version control, and you need evaluation criteria that are measurable before and after each change, not just a sense that it "looks a little better." That is exactly what we had to develop on the fly at Nirvana. The crowd-sourced skill system was powerful, but the feedback loop on skill quality was mostly intuition. Starting fresh, I want to build testing harnesses from day one. Piyanka's planning approach, which is to ask all the questions up front and minimize the reruns, is a pattern I want embedded in how we scope AI workflows in the new role. The Nirvana chapter gave me the foundation: adoption, infrastructure, and a platform that held. This conversation reminded me that measurement and planning are where optimization actually lives.

Listen anywhere