When I first started moderating panels about AI, the conversation was mostly theoretical. but sitting together Kishore Aradhya (Head of Data Engineering and Architecture, Frontdoor), Eli Tsinovoy (Head of AI, UKG), Shafiq “SQ” Quorashi (Senior Android/ML Games Engineer, The New York Times), and Manish Nigam (Senior Director, AI, Ameriprise Financial)The discussion was refreshingly different.
These are the leaders who are actually building agentic AI systems at scale, and learning some hard truths along the way.
Here’s what impressed me most: Every panelist emphasized starting small. Not because they lack ambition, but because they’ve learned that crawling before walking actually works.
Frontdoor’s teen put it perfectly when he said that rushing into “fancy agentic frameworks” is a recipe for disaster. Instead, his team focuses on known problems with measurable results before tackling anything more complex, like automating insurance claim review.
That mentality is something we often see in our work TrueFoundryWhere I am the CEO and co-founder. We spend a lot of time with teams trying to move AI from experimentation to reliable, controlled systems in production.
Hosting this panel made it clear how quickly real-world barriers (governance, access and infrastructure) become possible once models leave the demo stage.
No one talks about the model access maze
You would think that access to AI models would be easy by now. it. Each company on the panel took a completely different approach, and there’s a good reason for that.
Ally from UKG routes everything through Google Cloud’s Vertex AI. Makes sense to them; They get great model garden features and can control token usage without any construction. But here’s where it gets interesting: They’re already moving beyond basic controls toward LLM proxies for better routing and fallback models.
Meanwhile, the team of teens at Frontdoor run everything through Snowflake. Why? Because they are a 50 year old insurance company where governance means survival. Every action in their Snowflake ecosystem is traceable by default. No extra work needed.
The New York Times takes another approach. Shafiq explained that they actually maintain separate AI infrastructure for their newsroom and business operations. Different needs, different equipment. The journalistic side of them requires absolute fidelity (no hallucinations in news stories, thank you very much), while the business side can explore design patterns and experiment more freely with subscription management.
And Ameriprise? They are using a managed environment approach that lets them maintain their conservative financial services stance while innovating. As Manish said, safety is paramount, but that doesn’t mean standing still.
Why are AI gateways becoming the new battlefield?
Here’s a heated debate: do we need expertise? AI gatewayOr can traditional API gateways handle the job?
Eli thinks API gateways could evolve to handle AI traffic. His reasoning? The network infrastructure is basically the same. Just add semantic concepts like “time to first token” instead of thinking in the raw payload. Keep it simple, avoid the “Maserati platforms that do everything”.
But Manish made a strong comeback on this. Traditional observability, tokens, latency, and cost work fine for basic chatbots. But agentic systems? That’s a different animal altogether.
Think about it: When an AI agent receives your request, it might:
- Decide which tools to use
- Call multiple APIs sequentially
- Access your file system
- Loop this process several times
- Coordinate with other agents
How do you find out? How do you debug when something goes wrong? Traditional observational tools show you what, but agentic systems have to show you why. logic. decision tree.
This is not academic; This is a real problem that these companies face every day. When your AI makes a decision about an insurance claim or a financial recommendation, you need to explain that decision. Not only for your users, but also for auditors, regulators, and your own teams trying to improve the system.
The tracing nightmare that keeps CTOs awake
Speaking of tracing, I want to share something that really surprises me: None of these companies have figured out enterprise-wide tracing yet. Not one.
Shafiq of The New York Times was completely honest about this. Different teams implement observability differently because their needs vary greatly. A mobile app producing content for users requires different tracing than a centralized system distributing output to multiple endpoints. They are also exploring using AI to help detect AI; Meta, right?
The challenge increases when you realize that most observability platforms only serve specific individuals. Engineers want marks. Data scientists want evaluation metrics. Product managers want user behavior data. No one has created a platform that provides good service to everyone.
Eli shared a painful lesson: He evaluated several big-name platforms (he mentioned Arise and LangSmith, among others) and wasn’t really happy with any of them. The demo looked great: fully organized data, smooth workflow. But drop them into a real enterprise environment with multiple hops, proxies, and layers of infrastructure? different story.
his advice? Test in advance. Back up vendors with their real data and workflows behind their demos. And pay attention to boring things like regulatory compliance and on-premise capabilities (which, by the way, are often “seven versions behind” cloud offerings).
What does “agent” really mean in practice?
Everyone talks about AI agents, but definitions are all over the map. Manish made the clearest proposal I’ve heard: An agent equals a model plus access to tools and memory. All three components. Miss one, and you’ve got something else.
It matters because it shapes how you build. Take the New York Times approach to code analysis. They are using agents to identify when their design systems have not been fully implemented across their codebase. It is a model (understanding code patterns) with tools (accessing the repository) and memory (tracking what has been analyzed). Classic agent behavior.
But here’s what no one tells you about building agents: non-fatalism stacks. The output of an LLM varies. Add tool calling? More variety. Coordinating multiple agents? You are now in what Shafiq calls the “undiscovered continent” of complexity.
This is why teenagers insist on starting with problems that humans have already solved. If a human can do it with reasonable accuracy, you have a baseline. You understand the decision process. You can measure improvement. But asking an agent to do something no one understands? That’s asking for trouble.
MCP revolution everyone is watching
Just when enterprises are getting comfortable with current AI architectures, along comes MCP (Model Context Protocol) to shake things up. The panelists see this as potentially transformative but with caveats.
Manish explains why MCP matters: Standardization. Before MCP, every integration with external systems was custom-built. There is now a common framework for models to access files, call tools, and interact with APIs. Even better, this standardization extends to observability; If everything uses the same protocol, tracing becomes manageable.
But Eli is most excited about the MCP’s memory capabilities. It’s not the basic session memory that everyone is implementing, but graph-based memory systems that can make conversations truly personal and relevant. As he said, “Everyone will start claiming that their systems have memory. But you have to ask what kind?”
The serious part? If these companies started their AI initiatives today, they would build differently. Eli acknowledged that he has spent a lot of time developing the capabilities that MCP now provides out of the box. Connect an MCP server to the cloud, and you can match which teams have worked together over the months.
Still, Kishore offered a perspective: MCP is just a protocol, like TCP/IP. What matters is the semantic layer: context engineering, the knowledge graph, and the meaning you build on top. He quotes Andrzej Karpathy: “Context engineering beats accelerated engineering.” Protocol is just plumbing.
Two obstacles no one wants to accept
When an audience member asked about the biggest barriers to agent adoption, the panelists’ answers were revealing.
First hurdle: alignment. Not technical alignment but human alignment. Manish stressed on bringing all the stakeholders in one room quickly. Risk teams, business owners and technical teams. Without a unified agreement on the problem and approach, you are creating expensive proofs of concept that never get implemented.
This is especially true for agentic systems that fundamentally change the workflow. Customer service representatives were not trained for AI assistance. This is change management. And change management at enterprise scale is brutal.
The second hurdle: starting with solutions instead of problems. Ellie shared a cautionary tale: Executives are demanding that we “make AI as big as possible.” He built a lot. Most of it will never create value because it was technology looking for a problem.
Their plan to avoid this? Define what AI excels at in your industry:
- Automation of repetitive tasks
- Bringing forth unique insights
- decision aid
- scenario simulation
Then, solve your business problems the old-fashioned way. See where these abilities really help. String small wins together into bigger systems.
Every venture needs a reality check
After moderating this panel, I am confident of three things:
One, the companies that succeed with AI agents aren’t the ones with the biggest budgets or flashiest technology. They’re people who start small, measure obsessively, and build on proven foundations.
Two, infrastructure matters more than models. Each panelist spent more time discussing the gateways, observations, and protocols than the LLMs they used. The pipeline determines what you can build.
Three, we are still in the early days. When experienced teams at major enterprises admit that they are “figuring it out”, that’s just honesty. The playbook for enterprise AI agents is being written right now, in real time, by teams willing to share both successes and failures.
The way forward is not about revolutionary leaps. It’s about evolutionary steps. Crawl with simple automation. Move with integrated workflows. Maybe then, and only then, go with fully autonomous agents.
As Kishore reminded us, there is always a human being somewhere in these loops who understands the task. Start there. Build from that understanding. And don’t let anyone convince you that applying agents to undefined problems is innovation.
it. This is just an expensive experiment.
Real innovation happens when you match AI’s capabilities to real business needs, build the infrastructure to support it at scale, and create transparency that satisfies everyone from engineers to auditors. That’s not sexy. But that’s what actually works.
