Robin Zheng

A Neural Lip-Sync Framework for Synthesizing High-Resolution, Photorealistic, and Pose-Adaptive Virtual News Anchors

We develop a neural framework for producing speech-driven virtual news anchors. Our goal is to synthesize high-resolution, photo-realistic, visual nature videos based on audio content and target roles. Our solution can benefit a variety of scenarios including video editing,film industry, game development, and smart agent.

Key Components

Learn audio-to-mouth mapping via adversarial temporal convolutional networks
High-resolution and photorealistic neural rendering
Background matching and motion editing

Publications

“A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors”, Accpeted by ICPR 2020。link!
“ObamanetHD: Towards High-Resolution and Pose-Adaptive Lip Sync from Speech”, Under review.

Applications

Smart Quiz App