A Neural Lip-Sync Framework for Synthesizing High-Resolution, Photorealistic, and Pose-Adaptive Virtual News Anchors
We develop a neural framework for producing speech-driven virtual news anchors. Our goal is to synthesize high-resolution, photo-realistic, visual nature videos based on audio content and target roles. Our solution can benefit a variety of scenarios including video editing,film industry, game development, and smart agent.
Key Components
- Learn audio-to-mouth mapping via adversarial temporal convolutional networks
- High-resolution and photorealistic neural rendering
- Background matching and motion editing
Publications
- “A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors”, Accpeted by ICPR 2020。link!
- “ObamanetHD: Towards High-Resolution and Pose-Adaptive Lip Sync from Speech”, Under review.
Applications
- Smart Quiz App