Google's new anything-to-anything AI model is extremely unique

Last year I deepfaked my kid’s stuffed animal to make it look like his plush deer was on vacation.

It was an experiment to see if I could recreate the events depicted in the Gemini ad run by Google, and I have never shown videos of Buddy Deer’s adventures to my four-year-old. But it was a revealing exercise that made me think a lot about the difference between generative AI and some harmless entertainment with full-on slop. Maybe the Venn diagram is a perfect circle! Probably not. But I definitely know that the tools for creating realistic videos are amazingly good, requiring surprisingly little effort and know-how. And this trend continues in the Omni era of Gemini.

Omni is a new family of generative models that will reportedly one day be able to transform any type of input – photos, videos, text – into anything else. But for starters, it’s just making videos. Omni Flash is the first of these models that Google has released, now available in the company’s AI video generation and editing platform, Flow. You can still use the previous model, the Veo, if you want, but the Omni improves on the Veo in a few ways.

With Omni, you can upload a video and use it as the starting point for your AI-generated creation with text prompts. Google also claims that Omni incorporates more real-world knowledge when creating videos and can do a better job of keeping characters consistent throughout the video as a result. There was really only one way to find out if those claims were true: I brought AI Buddy back to pack his little AI-generated backpack for another adventure.

The results are so mixed they are surprising. Some were very good – much more consistent and in line with my signal than when I was testing VO five months ago. But even the best clips that Omni produced for me still had a few AI jump scares, such as when Buddy suddenly changes orientation while skydiving.

For another video, I gave Omni some artistic freedom. “Create a montage of Buddy packing for the holidays and boarding a cruise ship for a tropical vacation. The mood is cute and playful. Buddy packs something fun in his suitcase that appears later in the clip.” Buddy had packed a jar of honey in it; Later in the clip he holds it as if it were a bottle of sunscreen. “Uh oh,” the character says, sprinkling honey on his hoof.

Honestly, nothing bad. Except that the honey bottle changes constantly throughout the video, from a jar, to a clear squirt bottle filled with water, then back to a squeeze bottle filled with honey. And I can’t even begin to explain how the model came up with the final frame of the video – almost as if it just assembled a bunch of elements from the sequence it created.

You can use text-based prompts to suggest edits to your videos, and I’ll give Google credit: it works better with the Omni than it did when I tested the Veo 3. but the results were not good Bad With VO – so bad that I found it easier to immediately start a new video whenever I wanted to change something. Omni will indeed take your edits on board, but the results are not always affected.

I tried to emphasize Buddy’s facial reactions in his vacation clips, and the results looked downright weird. It also gave him big antlers from time to time, which he does not have. the big one is Childthank you so much. When I prompted him to remove the horns that were visible in one scene, he obliged – and then removed all the other horns.

The thing is, none of this is free. Creating a video costs credits, which vary from 15 to 40 credits depending on the length of the scene and the “content” you create. One round of editing costs 40 credits. I have the $20 per month AI Pro plan which comes with 1,000 credits per month. After about 20 clips ready with some edits, I’m down to 145. If you have specific ideas about the video you want the Omni to produce, you may be looking at a lot of expensive fiddling with the model to get a video that is close to your vision.

I can truly say that I was not prepared for what I saw

One of Omni’s purported powers is to add AI-generated content to real videos, so I gave Buddy a break and deepfaked it myself. Starting with a selfie video with a neutral expression, I inspired Omni to create videos of me eating a plate of spaghetti, sitting in an airplane seat, and snacking on a baguette while standing in front of the Eiffel Tower. And I can truly say that I was not prepared for what I saw.

My deepfake video explains AI. The clink of a fork hitting a bowl of pasta is a little too artificial. There is a woman who appears twice in the background of the airplane video. But apart from those minor glitches and a vaguely weird sense of humor about them, they’re absolutely convincing.

I showed the pasta clip to my husband; He knew I was testing an AI video tool, but I didn’t tell him what was generated by the AI in the scene. Without knowing what was AI-generated about it, they assumed I was eating pasta while sitting in front of the camera, and said their only clue that something had happened was that the bowl looked unfamiliar. Eating pasta seemed too real to be believed. my husband. A man who basically saw me in real life every single day for the last decade.

My other deepfakes are varying degrees of “good enough to fool people on social media” level. Some of the Eiffel Tower clips look a bit cartoonish, but one of them is so impressive that you may need to re-watch it a few times to see that it’s AI. I I know it’s not me when Ai Mi turns her head and shows me her hair pulled back into a ponytail. But I’m not sure anyone else would know the difference, and that makes me feel weird.

We’re definitely deep into the uncanny valley

To be honest, I’m a little tired of it all. I was amazed when I tested the VO3 with realism. I’m amazed at how easy it has become to show fake people in fake photos over and over again over the years. I probably should have been shocked by the Omni too, and I guess I was, but the edge has worn off.

Creating an AI-generated cinematic masterpiece still isn’t as easy as Google wants you to believe. But the Omni improves on the VO in some recognizable ways. If you have a Google account and a credit card, you can take a video of yourself sitting at home and with a little effort make it look like you’re on a flight to Maui. I don’t think we’re really at the “foothills of the Singularity,” but we’re certainly deep in the uncanny valley.

All images and videos in this story were produced by Google Gemini.

Follow topics and authors To see more like this in your personalized homepage feed and get email updates from this story.

alison johnson

Google’s new anything-to-anything AI model is extremely unique

Download: The future of coding, the ‘Steroid Olympics’ and AI-powered science

Scaling AI in production: context, control, and confidence

Related Articles

Leave a Comment Cancel Reply