Audio-visual feature fusion via deep neural networks for automatic speech recognition
暂无分享,去创建一个
Farshad Almasganj | Seyyed Ali Seyyedsalehi | Mohammad Hasan Rahmani | Mohammad Hasan Rahmani | F. Almasganj | S. Seyyedsalehi
[1] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[2] J.N. Gowdy,et al. CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[3] Tara N. Sainath,et al. Deep Convolutional Neural Networks for Large-scale Speech Tasks , 2015, Neural Networks.
[4] Sridha Sridharan,et al. Adaptive Fusion of Speech and Lip Information for Robust Speaker Identification , 2001, Digit. Signal Process..
[5] Farshad Almasganj,et al. Lip-reading via a DNN-HMM hybrid system using combination of the image-based and model-based features , 2017, 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA).
[6] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..
[7] Satoshi Tamura,et al. Investigation of DNN-Based Audio-Visual Speech Recognition , 2016, IEICE Trans. Inf. Syst..
[8] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[9] Seyyed Ali Seyyedsalehi,et al. A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks , 2015, Neurocomputing.
[10] Herman J. M. Steeneken,et al. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..
[11] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[12] Jing Huang,et al. Audio-visual deep learning for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[13] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[14] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[15] Kevin P. Murphy,et al. A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[16] Yuan Yuan,et al. Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading , 2017, ArXiv.
[17] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[19] Murat Hüsnü Sazli,et al. Speech recognition with artificial neural networks , 2010, Digit. Signal Process..
[20] Maja Pantic,et al. End-to-end visual speech recognition with LSTMS , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[22] Mehryar Mohri,et al. Speech Recognition with Weighted Finite-State Transducers , 2008 .
[23] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[24] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[25] M. Kramer. Nonlinear principal component analysis using autoassociative neural networks , 1991 .
[26] Darryl Stewart,et al. Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions , 2014, IEEE Transactions on Cybernetics.
[27] Jan Cernocký,et al. Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.