暂无分享,去创建一个
Haitian Zheng | Lele Chen | Guofeng Cui | Ziyi Kou | Chenliang Xu | Chenliang Xu | Lele Chen | Guofeng Cui | Ziyi Kou | Haitian Zheng
[1] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[2] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[3] Rada Mihalcea,et al. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations , 2018, ACL.
[4] Lei Xie,et al. Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.
[6] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[7] Xiaoming Liu,et al. Representation Learning by Rotating Your Faces , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] Patrick Pérez,et al. Deep video portraits , 2018, ACM Trans. Graph..
[9] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[10] Hujun Bao,et al. Audio-driven Talking Face Video Generation with Natural Head Pose , 2020, ArXiv.
[11] Ragini Verma,et al. CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset , 2014, IEEE Transactions on Affective Computing.
[12] Tony Ezzat,et al. Transferable videorealistic speech animation , 2005, SCA '05.
[13] Andrzej Czyzewski,et al. An audio-visual corpus for multimodal automatic speech recognition , 2017, Journal of Intelligent Information Systems.
[14] Shilin Wang,et al. Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Chen Sun,et al. Unsupervised Learning of Object Structure and Dynamics from Videos , 2019, NeurIPS.
[16] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Alexei A. Efros,et al. Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[18] Maja Pantic,et al. Realistic Speech-Driven Facial Animation with GANs , 2019, International Journal of Computer Vision.
[19] Joon Son Chung,et al. You Said That?: Synthesising Talking Faces from Audio , 2019, International Journal of Computer Vision.
[20] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[21] Carlos Busso,et al. MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception , 2017, IEEE Transactions on Affective Computing.
[22] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[23] Lianhong Cai,et al. Head and facial gestures synthesis using PAD model for an expressive talking avatar , 2014, Multimedia Tools and Applications.
[24] Patrick Pérez,et al. VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track , 2015, Comput. Graph. Forum.
[25] Justus Thies,et al. Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.
[26] Jan Kautz,et al. Few-shot Video-to-Video Synthesis , 2019, NeurIPS.
[27] S. R. Livingstone,et al. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.
[28] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[29] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[30] Siwei Zhang,et al. One-shot Face Reenactment , 2019, BMVC.
[31] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[32] Antonio Camurri,et al. Toward a Minimal Representation of Affective Gestures , 2011, IEEE Transactions on Affective Computing.
[33] Giampiero Salvi,et al. Using HMMs and ANNs for mapping acoustic to visual speech , 1999 .
[34] Chenliang Xu,et al. Lip Movements Generation at a Glance , 2018, ECCV.
[35] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[36] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[37] Shenghua Gao,et al. Future Frame Prediction for Anomaly Detection - A New Baseline , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[38] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] D. McNeill,et al. Speech-gesture mismatches: Evidence for one underlying representation of linguistic and nonlinguistic information , 1998 .
[40] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.
[41] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.
[42] Jörn Ostermann,et al. Realistic facial expression synthesis for an image-based talking head , 2011, 2011 IEEE International Conference on Multimedia and Expo.
[43] Francesc Moreno-Noguer,et al. GANimation: One-Shot Anatomically Consistent Facial Animation , 2019, International Journal of Computer Vision.
[44] Andrew Zisserman,et al. X2Face: A network for controlling face generation by using images, audio, and pose codes , 2018, ECCV.
[45] Jingwen Zhu,et al. Talking Face Generation by Conditional Recurrent Adversarial Network , 2018, IJCAI.
[46] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[48] Nicu Sebe,et al. Appearance and Pose-Conditioned Human Image Generation Using Deformable GANs , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[49] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[50] Georgios Tzimiropoulos,et al. How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[51] King-Sun Fu,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[52] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[53] Adam Finkelstein,et al. Text-based editing of talking-head video , 2019, ACM Trans. Graph..
[54] Jaegul Choo,et al. Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Hang Zhou,et al. Talking Face Generation by Adversarially Disentangled Audio-Visual Representation , 2018, AAAI.
[56] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[57] Jaakko Lehtinen,et al. Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[58] Jan Kautz,et al. Video-to-Video Synthesis , 2018, NeurIPS.
[59] Andreas Rössler,et al. FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[60] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[61] Lina J. Karam,et al. A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection , 2009, 2009 International Workshop on Quality of Multimedia Experience.
[62] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[63] Daniel Cohen-Or,et al. Bringing portraits to life , 2017, ACM Trans. Graph..
[64] Victor Lempitsky,et al. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[65] Jitendra Malik,et al. Learning Individual Styles of Conversational Gesture , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Thomas Huang,et al. FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis , 2019, AAAI.
[67] Chenliang Xu,et al. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Samy Bengio,et al. Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..
[69] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.