avatar

Robin Zheng

A Neural Lip-Sync Framework for Synthesizing High-Resolution, Photorealistic, and Pose-Adaptive Virtual News Anchors

We develop a neural framework for producing speech-driven virtual news anchors. Our goal is to synthesize high-resolution, photo-realistic, visual nature videos based on audio content and target roles. Our solution can benefit a variety of scenarios including video editing,film industry, game development, and smart agent.

Key Components

  • Learn audio-to-mouth mapping via adversarial temporal convolutional networks
  • High-resolution and photorealistic neural rendering
  • Background matching and motion editing

Publications

  • “A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors”, Accpeted by ICPR 2020。link!
  • “ObamanetHD: Towards High-Resolution and Pose-Adaptive Lip Sync from Speech”, Under review.

Applications

  • Smart Quiz App
    3D Cartoon