Creating complex deepfakes from a single input image - how advanced is this technology currently?

Lately, we have been seeing some pretty hilarious memes that are made from AIs. Looking back, one of the earliest memeable AI was definitely from deepfakes, and now more and more AIs are being used comedically, and they are just too much fun. Remember the first order motion model that I covered a few months ago? It does a super good job at deepfaking with only one source image, however it only works on human faces. In their demo, they did show some horse motions transfer or human movements transfer, but it was not as good as the facial motions transfer. So, onto the latest AI news. 3 weeks ago, an AI research paper called Liquid Warping Gen was published, and just a few days ago, they have open sourced their codes. And first of all, their official demo looks too good to be true. It looks insanely good at motion capturing, mapping 3D bodies, image and painting, and many other details that have shown to be difficult to completely function on its own. And to blend all these into one, we would probably expect something not as functional. But after they released their codes, I am absolutely convinced. It is just inconceivably good at motion transferring, even if it is only limited to just one single person. So you basically can take any full body shot of anyone that has a defined body part, and use a reference video to perform motion transfer with just one single command, but a fairly lengthy one. Other than the incredible meme potential, this AI can be used in places such as game development, character animation, and it even can be used for trying on clothes with its appearance transfer function, which sounds amazing if it is developed even further. Looking at my results, they are just hilarious. If not enough information about a person's body is presented, the AI would instead paste the front body on their back, which creates such monstrosity. This is also called novel view synthesis, where the AI would generate new information about the perspectives that are not presented in the input, like the side view of a person or the backside. This is similar to how PifuHD generates the 3D model, but since it is just model generation, not skin generation, they use the front image instead. But in this case, the author can't generate new information of the skin from the back, so they just apply the same image from the front. However, this can be easily improved as the author made it so the AI can take in both the front side and the backside image information and generate a full 3D model for motion transfer. So now, it wouldn't have the problem of having a double-sided face. This would definitely be really beneficial if this can be extracted as a 3D model too. Even though the generated output does look a bit too rounded, which looks like some of that 2000s game graphics for characters, but hey, this only takes like 3 minutes to render and 20 years ago, people still needed to animate this themselves, which can take way longer than 5 minutes. The official demo did also provide a 1080p version, which shows how it is not limited like FOM to just 500x500 pixels. And compared to their previous work, this is a huge step up and a big milestone for them. And this would also mean we will soon have some better technologies and memes for us to enjoy. Not gonna lie, it is only a matter of time before a meme with this AI takes over the internet. So if you want to play around with this AI yourself, I'll link my installation tutorial that's on my spam channel down in the description. This video is sponsored by Infinite Red. Infinite Red Consulting handles your mobile, web, and AI needs. If you are looking for someone to build your app, visit with the link down in the description. If you have any questions about the installation or wanna share your results, you can head over to my discord channel. A big shoutout to Mark Finn and many other Patreons that support me on Patreon. Follow me on Twitter if you haven't and I'll see you all in the next one.

Menu

Creating complex deepfakes from a single input image - how advanced is this technology currently?

Toggle timeline summary

Transcription