Speech-Driven Facial Animation Using Cascaded GANs for Learning of Motion and Texture
暂无分享,去创建一个
Dipanjan Das | Brojeshwar Bhowmick | Sanjana Sinha | Sandika Biswas | Dipanjan Das | Sanjana Sinha | S. Biswas | Brojeshwar Bhowmick
[1] Jingwen Zhu,et al. Talking Face Generation by Conditional Recurrent Adversarial Network , 2018, IJCAI.
[2] Gaurav Mittal,et al. Animating Face using Disentangled Audio Representations , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[3] Bernhard Schölkopf,et al. A Kernel Method for the Two-Sample-Problem , 2006, NIPS.
[4] Chenliang Xu,et al. Deep Cross-Modal Audio-Visual Generation , 2017, ACM Multimedia.
[5] Louis-Philippe Morency,et al. OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[6] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[7] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[8] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[9] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.
[10] Lei Xie,et al. Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Anuj Srivastava,et al. Statistical shape analysis: clustering, learning, and testing , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Hang Zhou,et al. Talking Face Generation by Adversarially Disentangled Audio-Visual Representation , 2018, AAAI.
[13] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[14] Lina J. Karam,et al. A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection , 2009, 2009 International Workshop on Quality of Multimedia Experience.
[15] Gang Yu,et al. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.
[16] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[17] Siwei Lyu,et al. In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).
[18] Maja Pantic,et al. Realistic Speech-Driven Facial Animation with GANs , 2019, International Journal of Computer Vision.
[19] Siwei Lyu,et al. In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking , 2018, ArXiv.
[20] Patrick Pérez,et al. VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track , 2015, Comput. Graph. Forum.
[21] Chenliang Xu,et al. Lip Movements Generation at a Glance , 2018, ECCV.
[22] Maja Pantic,et al. End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs , 2019, CVPR Workshops.
[23] Hao Zhu,et al. High-Resolution Talking Face Generation via Mutual Information Approximation , 2018, ArXiv.
[24] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[25] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[26] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[27] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.
[28] Chenliang Xu,et al. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.
[30] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[31] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[32] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[33] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[35] Michael J. Black,et al. Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[37] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).