Cross-Domain Deep Visual Feature Generation for Mandarin Audio–Visual Speech Recognition
暂无分享,去创建一个
Xunying Liu | Rongfeng Su | Lan Wang | Jingzhou Yang | Xunying Liu | Lan Wang | Rongfeng Su | Jingzhou Yang
[1] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Lan Wang,et al. Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription , 2018, INTERSPEECH.
[3] Jianwu Dang,et al. Audio-visual speech recognition integrating 3D lip information obtained from the Kinect , 2016, Multimedia Systems.
[4] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Mark J. F. Gales,et al. The Cambridge University 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translation , 2015, INTERSPEECH.
[6] Zhigang Deng,et al. Natural head motion synthesis driven by acoustic prosodic features , 2005, Comput. Animat. Virtual Worlds.
[7] Mark J. F. Gales,et al. Development of the CUHTK 2004 Mandarin conversational telephone speech transcription system , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[8] Reda A. El-Khoribi,et al. Audio-Visual Speech Recognition for People with Speech Disorders , 2014 .
[9] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[10] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Georg Heigold,et al. Development of the 2007 RWTH Mandarin LVCSR system , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).
[12] Wen Wang,et al. Articulatory Information and Multiview Features for Large Vocabulary Continuous Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[14] Tetsuya Takiguchi,et al. Audio-Visual Speech Recognition for a Person with Severe Hearing Loss Using Deep Canonical Correlation Analysis , 2017 .
[15] Dimitra Vergyri,et al. Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[17] Yi Liu,et al. Recent advances in the IBM GALE Mandarin transcription system , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] Benjamin Schrauwen,et al. Training and Analysing Deep Recurrent Neural Networks , 2013, NIPS.
[19] Yifan Gong,et al. Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[20] Andreas Stolcke,et al. Articulatory trajectories for large-vocabulary speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[22] Ahmed Farag,et al. A robust speech disorders correction system for Arabic language usingvisual speech recognition. , 2013 .
[23] Simon King,et al. Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Bin Ma,et al. Robust Audio-visual Speech Recognition Using Bimodal Dfsmn with Multi-condition Training and Dropout Regularization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[26] Ben P. Milner,et al. Audio-to-Visual Speech Conversion Using Deep Neural Networks , 2016, INTERSPEECH.
[27] Satoshi Tamura,et al. Integration of deep bottleneck features for audio-visual speech recognition , 2015, INTERSPEECH.
[28] Peng Liu,et al. A deep recurrent approach for acoustic-to-articulatory inversion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Elliot Saltzman,et al. Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition , 2017, Speech Commun..
[30] Wei Chen,et al. Modality Attention for End-to-end Audio-visual Speech Recognition , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Jing Huang,et al. Audio-visual deep learning for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[32] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[33] Dani Byrd,et al. TADA: An enhanced, portable Task Dynamics model in MATLAB , 2004 .
[34] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] Chao Zhang,et al. Supplementary data for "Parameterised Sigmoid and ReLU HiddenActivation Functions for DNN Acoustic Modelling" , 2015 .
[36] Jenq-Neng Hwang,et al. Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System , 2001, J. VLSI Signal Process..
[37] Farshad Almasganj,et al. Audio-visual feature fusion via deep neural networks for automatic speech recognition , 2018, Digit. Signal Process..
[38] Elliot Saltzman,et al. Articulatory features from deep neural networks and their role in speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[40] Ricardo Gutierrez-Osuna,et al. Audio/visual mapping with cross-modal hidden Markov models , 2005, IEEE Transactions on Multimedia.
[41] Etsuya,et al. Audio-Visual Speech Recognition Using Convolutive Bottleneck Networks for a Person with Severe Hearing Loss , 2015 .
[42] Elliot Saltzman,et al. Articulatory Information for Noise Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[43] Tetsuya Takiguchi,et al. Multimodal speech recognition of a person with articulation disorders using AAM and MAF , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.
[44] Dong Yu,et al. Exploring convolutional neural network structures and optimization techniques for speech recognition , 2013, INTERSPEECH.
[45] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Carlos Busso,et al. Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[47] Jun Yu,et al. A multi-channel/multi-speaker interactive 3D audio-visual speech corpus in Mandarin , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[48] Lan Wang,et al. Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information , 2016, INTERSPEECH.
[49] Laurens van der Maaten,et al. Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..
[50] Florian Metze,et al. Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach , 2016, INTERSPEECH.