Fusion Architectures for Word-Based Audiovisual Speech Recognition
暂无分享,去创建一个
[1] Michael Wand,et al. Motion Dynamics Improve Speaker-Independent Lipreading , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Naomi Harte,et al. Can DNNs Learn to Lipread Full Sentences? , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).
[3] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[4] Jenq-Neng Hwang,et al. Lipreading from color video , 1997, IEEE Trans. Image Process..
[5] S. Huffel,et al. Non-EEG seizure-detection systems and potential SUDEP prevention: State of the art , 2013, Seizure.
[6] Xavier Serra,et al. Freesound technical demo , 2013, ACM Multimedia.
[7] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.
[8] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[9] Dezhong Yao,et al. EEG/fMRI fusion based on independent component analysis: integration of data-driven and model-driven methods. , 2012, Journal of integrative neuroscience.
[10] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[11] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Dorothea Kolossa,et al. Learning Dynamic Stream Weights For Coupled-HMM-Based Audio-Visual Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[13] Björn W. Schuller,et al. Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.
[14] Eric D. Petajan. Automatic lipreading to enhance speech recognition , 1984 .
[15] Robert M. Nickel,et al. Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR , 2016, INTERSPEECH.
[16] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[18] Naomi Harte,et al. Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition , 2018, ICMI.
[19] Maja Pantic,et al. Deep complementary bottleneck features for visual speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Tanja Schultz,et al. Biosignal-Based Spoken Communication: A Survey , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[23] Jürgen Schmidhuber,et al. Improving Speaker-Independent Lipreading with Domain-Adversarial Training , 2017, INTERSPEECH.
[24] Petr Motlícek,et al. A Large-Scale Open-Source Acoustic Simulator for Speaker Recognition , 2016, IEEE Signal Processing Letters.
[25] Matthew Turk,et al. Multimodal interaction: A review , 2014, Pattern Recognit. Lett..
[26] Eric David Petajan,et al. Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .
[27] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[28] Christian Jutten,et al. Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects , 2015, Proceedings of the IEEE.
[29] Mirela C. Popa,et al. Multimodal fusion based on information gain for emotion recognition in the wild , 2017, 2017 Intelligent Systems Conference (IntelliSys).
[30] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[31] B.P. Yuhas,et al. Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.
[32] Ahmed Hussen Abdelaziz. Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Ngoc Thang Vu,et al. Investigations on End- to-End Audiovisual Fusion , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).