LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization
暂无分享,去创建一个
Vivek Kwatra | Avisek Lahiri | Chris Bregler | Christian Frueh | John Lewis | C. Bregler | Vivek Kwatra | C. Frueh | A. Lahiri | John Lewis
[1] Frank K. Soong,et al. Text Driven 3D Photo-Realistic Talking Head , 2011, INTERSPEECH.
[2] Michael J. Black,et al. Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..
[3] Hang Zhou,et al. Talking Face Generation by Adversarially Disentangled Audio-Visual Representation , 2018, AAAI.
[4] Patrick Pérez,et al. Deep video portraits , 2018, ACM Trans. Graph..
[5] S. Umeyama,et al. Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..
[6] Maja Pantic,et al. Realistic Speech-Driven Facial Animation with GANs , 2019, International Journal of Computer Vision.
[7] Chenliang Xu,et al. Lip Movements Generation at a Glance , 2018, ECCV.
[8] Michael J. Black,et al. Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Subhransu Maji,et al. Visemenet , 2018, ACM Trans. Graph..
[10] Ken-ichi Anjyo,et al. Practice and Theory of Blendshape Facial Models , 2014, Eurographics.
[11] Justus Thies,et al. Neural Voice Puppetry: Audio-driven Facial Reenactment , 2020, ECCV.
[12] John P. Lewis,et al. Face Stabilization by Mode Pursuit for Avatar Construction , 2018, 2018 International Conference on Image and Vision Computing New Zealand (IVCNZ).
[13] R. Gur,et al. Generating an item pool for translational social cognition research: Methodology and initial validation , 2014, Behavior Research Methods.
[14] Paul Dixon,et al. Modality Dropout for Improved Performance-driven Talking Faces , 2020, ICMI.
[15] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[16] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[17] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[18] Andrew Zisserman,et al. X2Face: A network for controlling face generation by using images, audio, and pose codes , 2018, ECCV.
[19] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[20] Karlheinz Gröchenig,et al. Short-time Fourier transform , 2003 .
[21] Eugene Fiume,et al. JALI , 2016, ACM Trans. Graph..
[22] Chenliang Xu,et al. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] King-Sun Fu,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[24] Maja Pantic,et al. End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs , 2019, CVPR Workshops.
[25] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[26] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.
[27] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[28] Stefanos Zafeiriou,et al. Optimal UV spaces for facial morphable model construction , 2014, 2014 IEEE International Conference on Image Processing (ICIP).
[29] C. V. Jawahar,et al. A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild , 2020, ACM Multimedia.
[30] Chen Change Loy,et al. Everybody’s Talkin’: Let Me Talk as You Want , 2020, IEEE Transactions on Information Forensics and Security.
[31] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[32] C. V. Jawahar,et al. Towards Automatic Face-to-Face Translation , 2019, ACM Multimedia.
[33] Adam Finkelstein,et al. Text-based editing of talking-head video , 2019, ACM Trans. Graph..
[34] Joon Son Chung,et al. You said that? , 2017, BMVC.
[35] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.
[36] Lina J. Karam,et al. A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection , 2009, 2009 International Workshop on Quality of Multimedia Experience.
[37] Narendra Ahuja,et al. Efficient and Robust Specular Highlight Removal , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Derek Bradley,et al. Rigid stabilization of facial expressions , 2014, ACM Trans. Graph..
[39] Yury Kartynnik,et al. Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs , 2019, ArXiv.
[40] Gaurav Mittal,et al. Animating Face using Disentangled Audio Representations , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[41] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.