VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
暂无分享,去创建一个
Zhiyong Wu | Jun Ling | Sheng Zhao | Runnan Li | Xuejiao Tan | Liyang Chen | Weihong Bao
[1] Zhenhui Ye,et al. GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis , 2023, ICLR.
[2] Tangjie Lv,et al. StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles , 2023, AAAI.
[3] Jun Ling,et al. StableFace: Analyzing and Improving Motion Stability for Talking Face Generation , 2022, ArXiv.
[4] Se Jin Park,et al. SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory , 2022, AAAI.
[5] Xiaoguang Han,et al. Expressive Talking Head Generation with Granular Audio-Visual Control , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Wayne Wu,et al. EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model , 2022, SIGGRAPH.
[7] Chen Change Loy,et al. Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Xin Yu,et al. One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning , 2021, AAAI.
[9] Foivos Paraperas Papantoniou,et al. Neural Emotion Director: Speech-preserving semantic control of facial expressions in “in-the-wild” videos , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Haozhe Wu,et al. Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis , 2021, ACM Multimedia.
[11] Thomas J. Cashman,et al. Fake it till you make it: face analysis in the wild using synthetic data alone , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Madhukar Budagavi,et al. FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Yu Ding,et al. Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Vivek Kwatra,et al. LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Chen Change Loy,et al. Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Xun Cao,et al. Audio-Driven Emotional Video Portraits , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] David A. Ross,et al. AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[18] Tie-Yan Liu,et al. DualLip: A System for Joint Lip Reading and Generation , 2020, ACM Multimedia.
[19] C. V. Jawahar,et al. A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild , 2020, ACM Multimedia.
[20] Christian Richardt,et al. Photorealistic Audio-driven Video Portraits , 2020, IEEE Transactions on Visualization and Computer Graphics.
[21] Michael I. Jordan,et al. Decision-Making with Auto-Encoding Variational Bayes , 2020, NeurIPS.
[22] Chen Change Loy,et al. Everybody’s Talkin’: Let Me Talk as You Want , 2020, IEEE Transactions on Information Forensics and Security.
[23] Justus Thies,et al. Neural Voice Puppetry: Audio-driven Facial Reenactment , 2019, ECCV.
[24] Chenliang Xu,et al. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Michael J. Black,et al. Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Jiaolong Yang,et al. Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[27] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[28] Yoshua Bengio,et al. ObamaNet: Photo-realistic lip-sync from text , 2017, ArXiv.
[29] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[31] Georgios Tzimiropoulos,et al. How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[32] Tal Hassner,et al. Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Max Welling,et al. Improving Variational Auto-Encoders using Householder Flow , 2016, ArXiv.
[34] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[36] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[37] Hao Wang,et al. Phonetic posteriorgrams for many-to-one voice conversion without parallel data training , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).
[38] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.
[39] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.
[40] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[41] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[42] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[43] Lina J. Karam,et al. A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection , 2009, 2009 International Workshop on Quality of Multimedia Experience.
[44] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[45] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.
[46] Eduardo de Campos Valadares,et al. Dancing to the music , 2000 .
[47] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.
[48] Karla A. Woodward. Alone , 1994 .
[49] S. Umeyama,et al. Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..
[50] Kei Sawada,et al. Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning , 2021, ICLR.
[51] Yu Qiao,et al. MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation , 2020, ECCV.
[52] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[53] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[54] Samuel B. Williams,et al. ASSOCIATION FOR COMPUTING MACHINERY , 2000 .