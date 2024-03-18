The team at Google has just rolled out something pretty wild in the world of AI. They've come up with a new system called VLOGGER that can take a single photo of someone and turn it into a video of that person talking, gesturing, and moving around. It's all thanks to some seriously smart machine learning that makes these videos look unbelievably realistic. This is a useful tool for creators — we can create dynamic characters, as we can now repeatedly use characters across various scenes and platforms to create continuous narratives. Think of movies, video games, marketing, influencers or any kind of story telling. But it also gets you thinking about the whole deepfake thing and how this could be misused to spread false info. In their paper, "VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis," the researchers explain how they do it. Give the AI a picture and some audio, and voilà, it creates a video where it looks like the person in the photo is saying what's in the audio. They're moving their face and hands just like they would if they were really talking. Sure, it's not 100% perfect and you might spot a few quirks, but it's a huge step forward in making still images come to life. From the VLOGGER white paper, on how the creation process works: “VLOGGER is a novel framework to synthesize humans from audio. Given a single input image like the ones shown on the őrst column, and a sample audio input, our method generates photorealistic and temporally coherent videos of the person talking and vividly moving. As seen on the synthesized images in the right columns, we generate head motion, gaze, blinking, lip movement and unlike previous methods, upper-body and hand gestures, thus taking audio-driven synthesis one step further.” Let that simmer in your brain as you dream up new types of content you can generate with a tool like this. 👊



