he changed. The machines aren’t built yet, but the money is flowing: Companies and investors are set to invest $6.1 billion in humanoid robots in 2025 alone, four times more than they invested in 2024.
What happened? A revolution in how machines have learned to interact with the world.
Imagine you want a pair of robot arms installed in your home to do one thing only: folding clothes. How will it learn to do this? You can start by writing rules. Test the fabric to find out how much deformation it can tolerate before tearing. Identify the shirt collar. Take the gripper on the left sleeve, lift it up and fold it inward exactly the same distance. Repeat for right sleeve. If the shirt rotates, adjust the plan accordingly. If the sleeve is bent, correct it. The number of rules explodes very quickly, but a complete accounting of them can produce reliable results. This was the basic craft of robotics: anticipating every possibility and encoding it in advance.
Around 2015, the cutting edge started doing things differently: create a digital simulation of robotic arms and clothing, and have the program give a reward signal every time it successfully folded and a ding every time it failed. This way, it gets better by trying all kinds of techniques through trial and error, with millions of iterations – in the same way that AI became good at playing games.
The arrival of ChatGPT in 2022 catalyzed the current surge. Trained on large amounts of text, large language models work not through trial and error but by learning to predict which word should come next in a sentence. Similar models adapted for robotics were soon able to absorb pictures, sensor readings, and the position of the robot’s joints and predict the next action the machine would take, issuing dozens of motor commands every second.
This conceptual shift – the reliance on AI models that ingest large amounts of data – appears to be working, whether it’s assistant robots talking to people, navigating an environment, or even performing complex tasks. And it was combined with other ideas about how to meet this new way of learning, such as deploying robots, even if they are not perfect yet, so that they can learn from the environment in which they have to work. Today, Silicon Valley roboticists are dreaming big again. Here’s how it happened.
jibo
jibo
A dynamic social robot interacted long before the era of LLM.
An MIT robotics researcher named Cynthia Brazile introduced an armless, legless, faceless robot called Jibo to the world in 2014. In fact, it looked like a lamp. Brazile aimed to create a social robot for families, and the idea raised $3.7 million in a crowdsourced funding campaign. The initial pre-order price is $749.
Early Jibo could introduce themselves and dance to entertain children, but that was about it. The goal was always for it to become a kind of embodied assistant that could handle everything from scheduling and email to telling stories. It acquired many dedicated users, but the company eventually shut down in 2019.
Courtesy of MIT Media Lab
Looking back, the one thing Jibo really needed was better language capabilities. It was competing against Apple’s Siri and Amazon’s Alexa, and at the time all of these technologies relied heavily on scripting. Broadly speaking, when you talk to them, the software will translate your speech to text, analyze what you wanted, and generate a response pulled from pre-approved snippets. Those bits may have been catchy, but they were also repetitive and just boring—Completely robotic. This was especially a challenge for a robot that was supposed to be social and family oriented.