Many examples have been published
Microsoft researchers have developed a new system, VASA-1, that can create realistic talking faces from a single image and audio track.
VASA-1 can recreate facial expressions, precisely synchronized lip movements and natural head movements. The new neural network can capture a wide range of emotions and subtle nuances, making the generated faces more believable. Users can specify the character's viewing direction, perceived distance, and even the character's emotional state.
VASA-1 achieves this realism by separating facial features, 3D head position and facial expressions into separate parts. The researchers behind VASA-1 emphasize the system's real-time efficiency. It can create video with a resolution of 512 x 512 pixels at 45 frames per second.
You can see a lot of examples of how the technology works on the official website.