Audio-visual keyword spotting based on adaptive decision fusion under noisy conditions for human-robot interaction
暂无分享,去创建一个
[1] Hongbin Zha,et al. Modeling facial expression space for recognition , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[2] Mohan M. Trivedi,et al. Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[3] Cheol Hoon Park,et al. Adaptive Decision Fusion for Audio-Visual Speech Recognition , 2008 .
[4] Matti Pietikäinen,et al. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] Trent W. Lewis,et al. Sensor Fusion Weighting Measures in Audio-Visual Speech Recognition , 2004, ACSC.
[6] Richard Rose,et al. A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[7] Sadaoki Furui,et al. A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[8] Haiyang Li,et al. Mandarin keyword spotting using syllable based confidence features and SVM , 2011, 2011 2nd International Conference on Intelligent Control and Information Processing.
[9] Aggelos K. Katsaggelos,et al. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features , 2002, EURASIP J. Adv. Signal Process..
[10] Hiroshi G. Okuno,et al. Automatic speech recognition improved by two-layered audio-visual integration for robot audition , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.
[11] Lukás Burget,et al. Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.
[12] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[13] Alexandrina Rogozan,et al. Adaptive fusion of acoustic and visual sources for automatic speech recognition , 1998, Speech Commun..
[14] Waleed H. Abdulla,et al. WFST-based Large Vocabulary Continuous Speech Decoder for Service Robots , 2012 .
[15] Stefan Wermter,et al. Towards Robust Speech Recognition for Human-Robot Interaction , 2011 .
[16] Matti Pietikäinen,et al. Towards a practical lipreading system , 2011, CVPR 2011.
[17] Jeff A. Bilmes,et al. DBN based multi-stream models for audio-visual speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[18] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[19] Mohan M. Trivedi,et al. Audio-Visual Fusion and Tracking With Multilevel Iterative Decoding: Framework and Experimental Evaluation , 2010, IEEE Journal of Selected Topics in Signal Processing.
[20] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[21] Ziyou Xiong,et al. Audio visual word spotting , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[22] Tetsuya Ogata,et al. Real-Time Robot Audition System That Recognizes Simultaneous Speech in The Real World , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[23] Md. Tariquzzaman,et al. Performance Improvement of Audio-Visual Speech Recognition with Optimal Reliability Fusion , 2011, 2011 International Conference on Internet Computing and Information Services.
[24] Stephen J. Cox,et al. Audiovisual speech recognition using multiscale nonlinear image decomposition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[25] Zhiwei Shuang,et al. Improved Mandarin Keyword Spotting Using Confusion Garbage Model , 2010, 2010 20th International Conference on Pattern Recognition.
[26] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[27] A. Adjoudani,et al. On the Integration of Auditory and Visual Parameters in an HMM-based ASR , 1996 .
[28] Trevor Darrell,et al. Visual speech recognition with loosely synchronized feature streams , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[29] Juergen Luettin,et al. Audio-Visual Automatic Speech Recognition: An Overview , 2004 .
[30] Chalapathy Neti,et al. Stream confidence estimation for audio-visual speech recognition , 2000, INTERSPEECH.
[31] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.