暂无分享,去创建一个
Taku Komura | Jun Saito | Zhaojiang Lin | Wenping Wang | Yingruo Fan | T. Komura | Jun Saito | Wenping Wang | Zhaojiang Lin | Yingruo Fan
[1] Dominic W. Massaro,et al. Animated speech: research progress and applications , 2001, AVSP.
[2] Erik Cambria,et al. Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.
[3] Erik Cambria,et al. Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Zhenfeng Fan,et al. 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head , 2021, ArXiv.
[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[7] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[8] Neal P. Fox,et al. Speaker-normalized sound representations in the human auditory cortex , 2019, Nature Communications.
[9] P. Ekman,et al. EMFACS-7: Emotional Facial Action Coding System , 1983 .
[10] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[11] Frédéric H. Pighin,et al. Expressive speech-driven facial animation , 2005, TOGS.
[12] Luc Van Gool,et al. A 3-D Audio-Visual Corpus of Affective Communication , 2010, IEEE Transactions on Multimedia.
[13] Yuyu Xu,et al. A Practical and Configurable Lip Sync Method for Games , 2013, MIG.
[14] Moshe Mahler,et al. Dynamic units of visual speech , 2012, SCA '12.
[15] Subhransu Maji,et al. Visemenet , 2018, ACM Trans. Graph..
[16] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[17] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[18] Gustav Eje Henter,et al. Gesticulator: A framework for semantically-aware speech-driven gesture generation , 2020, ICMI.
[19] Peter Robinson,et al. OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).
[20] Michael J. Black,et al. Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[22] P. Ekman,et al. Facial action coding system: a technique for the measurement of facial movement , 1978 .
[23] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[24] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[25] Yaser Sheikh,et al. MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Eugene Fiume,et al. JALI , 2016, ACM Trans. Graph..
[27] C. G. Fisher,et al. Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.
[28] Youngwoo Yoon,et al. Speech gesture generation from the trimodal context of text, audio, and speaker identity , 2020, ACM Trans. Graph..