Noise-Robust Multimodal Audio-Visual Speech Recognition System for Speech-Based Interaction Applications
暂无分享,去创建一个
[1] Hao Li,et al. Learning the Relative Dynamic Features for Word-Level Lipreading , 2022, Sensors.
[2] S. Jeon,et al. End-to-End Lip-Reading Open Cloud-Based Speech Architecture , 2022, Sensors.
[3] Perry Xiao,et al. An Effective Conversion of Visemes to Words for High-Performance Automatic Lipreading , 2021, Sensors.
[4] Petros Maragos,et al. A robotic edutainment framework for designing child-robot interaction scenarios , 2021, PETRA.
[5] Naoyuki Kubota,et al. Lifelong Robot Edutainment based on Self-Efficacy , 2021, 2021 5th IEEE International Conference on Cybernetics (CYBCONF).
[6] Gwang Yong Gim,et al. The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API , 2021, Int. J. Networked Distributed Comput..
[7] Matthias Schultalbers,et al. Speech recognition system for a service robot - a performance evaluation , 2020, 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV).
[8] Chulhee Lee,et al. Revisiting spatial dropout for regularizing convolutional neural networks , 2020, Multimedia Tools and Applications.
[9] Mauro Castelli,et al. The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset , 2020, ICT Express.
[10] Ornella Mich,et al. Framing the Design Space of Multimodal Mid-Air Gesture and Speech-Based Interaction With Mobile Devices for Older People , 2020, Int. J. Mob. Hum. Comput. Interact..
[11] Michal Vavrečka,et al. Edutainment Software for the Pepper Robot , 2019 .
[12] Daniel Hepperle,et al. 2D, 3D or speech? A case study on which user interface is preferable for what kind of object interaction in immersive virtual reality , 2019, Comput. Graph..
[13] Kuo-Hsing Cheng,et al. A Sketch Classifier Technique with Deep Learning Models Realized in an Embedded System , 2019, 2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).
[14] Von-Wun Soo,et al. AI Applications on Music Technology for Edutainment , 2018, ICITL.
[15] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.
[16] Haslina Arshad,et al. User Satisfaction for an Augmented Reality Application to Support Productive Vocabulary Using Speech Recognition , 2018, Adv. Multim..
[17] Carlo Luschi,et al. Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.
[18] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.
[19] Kai Xu,et al. LCANet: End-to-End Lipreading with Cascaded Attention-CTC , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[20] Andreas Stolcke,et al. The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] In So Kweon,et al. Convolutional Block Attention Module , 2018, ECCV 2018.
[22] Yang You,et al. Large Batch Training of Convolutional Networks , 2017, 1708.03888.
[23] Veton Kepuska,et al. Comparing Speech Recognition Systems (Microsoft API, Google API And CMU Sphinx) , 2017 .
[24] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[25] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[27] Maja Pantic,et al. Deep complementary bottleneck features for visual speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Celia Woolf,et al. Using voice recognition software to improve communicative writing and social participation in an individual with severe acquired dysgraphia: an experimental single-case therapy study , 2015 .
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Jonathan Tompson,et al. Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Jonathan H. Venezia,et al. Multisensory Integration and Audiovisual Speech Perception , 2015 .
[33] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[34] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[35] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[36] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[37] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Kathrin Janowski,et al. Gestures or speech? Comparing modality selection for different interaction tasks in a virtual environment , 2013 .
[39] Riccardo Berta,et al. Assessment in and of Serious Games: An Overview , 2013, Adv. Hum. Comput. Interact..
[40] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[41] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[42] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[43] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[44] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[45] Yılmaz Kara,et al. Comparing the Impacts of Tutorial and Edutainment Software Programs on Students’ Achievements, Misconceptions, and Attitudes towards Biology , 2008 .
[46] Ruth Campbell,et al. The processing of audio-visual speech: empirical and neural bases , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.
[47] Kok Wai Wong,et al. Similarities and differences between learn through play and edutainment , 2006 .
[48] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[49] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[50] Jeffery A. Jones,et al. Brain activity during audiovisual speech perception: An fMRI study of the McGurk effect , 2003, Neuroreport.
[51] L. Bernstein,et al. Single-channel vibrotactile supplements to visual perception of intonation and stress. , 1989, The Journal of the Acoustical Society of America.
[52] P K Kuhl,et al. The contribution of fundamental frequency, amplitude envelope, and voicing duration cues to speechreading in normal-hearing subjects. , 1985, The Journal of the Acoustical Society of America.
[53] Barbara Dodd,et al. The Role of Vision in the Perception of Speech , 1977, Perception.
[54] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[55] W. H. Sumby,et al. Visual contribution to speech intelligibility in noise , 1954 .