Hey, I’m Philip, Senior AI Relationship Engineer at Google DeepMind. My days revolve around making our models more accessible to developers, helping you build applications, chatbots, and agents with Gemini. But what struck me during my recent conversation is this: When I asked who in production had chatbots, hands went up everywhere. When I asked about the agents? The room became quiet.
That difference tells us something important. We’re at this interesting juncture where AI is moving from answering questions to actually doing work. And if you’re still thinking agents are just fancy chatbots, let me share what I’ve learned about where we’re really headed.
Evolution from text completion to autonomous action
Remember when LLM first came into vogue? They were essentially sophisticated self-contained devices. You’ll start with “my” and the model will predict “name”, continuing until it has the entire sentence. Great party trick, sure. But not exactly revolutionary for commercial applications.
The real change happened when we realized that these models needed to follow instructions, not just full text. Think about it: If you ask “What is the capital of France?” And your model is “What is the capital of Germany?” This is technically good text completion, but is completely useless for the actual task.
So, we taught these models to follow instructions. That was the first step. Then came chatbot interfaces (hello, ChatGPT), which made these systems conversational and user-friendly. But the game really changed when we introduced function calling; Suddenly, our models can access and interact with external services.
Now? We are entering the agent era. Instead of back-and-forth conversations, we give the model a goal and let it run. It decides which tools to use, when to use them, and keeps working until the job is done. No hand holding required.
A Brief History Lesson: How We Got Here
The first real AI agent was probably OpenAI’s WebGPT. They literally sat down humans, watched them browse the web, recorded every search, every click, every extraction of information, then trained GPT-3 on that dataset. Suddenly, GPT could browse the web using actions like search, browse, and click.
Then Meta released ToolFormer, which taught LLMs to recognize when they needed external help. Ask knowledge related questions? The model learned how to search Wikipedia. Need to solve a math problem? It reached a calculator.
For expert advice like this delivered straight to your inbox every other Friday, sign up for a Pro+ membership.
You’ll get 300+ hours of exclusive video content, a complimentary summit ticket, and much more.
So what are you waiting for?
Get Pro+