论文信息 - Efficient Video Portrait Reenactment via Grid-based Codebook

Efficient Video Portrait Reenactment via Grid-based Codebook

While progress has been made in the field of portrait reenactment, the problem of how to efficiently produce high-fidelity and accurate videos remains. Recent studies build direct mappings between driving signals and their predictions, leading to failure cases when synthesizing background textures and detailed local motions. In this paper, we propose the Video Portrait via Grid-based Codebook (VPGC) framework, which achieves efficient and high-fidelity portrait modeling. Our key insight is to query driving signals in a position-aware textural codebook with an explicit grid structure. The grid-based codebook stores delicate textural information locally according to our observations on video portraits, which can be learned efficiently and precisely. We subsequently design a Prior-Guided Driving Module to predict reliable features from the driving signals, which can be later decoded back to high-quality video portraits by querying the codebook. Comprehensive experiments are conducted to validate the effectiveness of our approach.

[1] Errui Ding,et al. Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers , 2022, SIGGRAPH Asia.

[2] V. Lempitsky,et al. MegaPortraits: One-shot Megapixel Neural Head Avatars , 2022, ACM Multimedia.

[3] Chen Change Loy,et al. Towards Robust Blind Face Restoration with Codebook Lookup Transformer , 2022, NeurIPS.

[4] Wayne Wu,et al. EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model , 2022, SIGGRAPH.

[5] T. Müller,et al. Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[6] Michael J. Black,et al. Learning an animatable detailed 3D face model from in-the-wild images , 2020, ACM Trans. Graph..

[7] Nicu Sebe,et al. First Order Motion Model for Image Animation , 2020, NeurIPS.

[8] Hans-Peter Seidel,et al. Neural style-preserving visual dubbing , 2019, ACM Trans. Graph..

[9] Hang Zhou,et al. Talking Face Generation by Adversarially Disentangled Audio-Visual Representation , 2018, AAAI.

[10] Patrick Pérez,et al. Deep video portraits , 2018, ACM Trans. Graph..

[11] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..

[12] Aaron C. Courville,et al. Generative adversarial networks , 2014, Commun. ACM.

[13] Thomas Vetter,et al. A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.