Enhanced Automatic Speech Recognition with Non-acoustic Parameters

A novel method for improving the accuracy of automatic speech recognition system by adding non-acoustic parameters are discussed in this paper. The gestural features which are commonly co-expressive with speech is considered for improving the accuracy of ASR system in noisy environment. Both dynamic and static gestures are integrated with speech recognition system and tested in various environmental conditions, i.e., noise levels. The accuracy of continuous speech recognition system and isolated word recognition system are tested with and without gestures under various noise conditions. The addition of visual features provides stable recognition accuracy under different environmental noise conditions for acoustic signals.

[1]  Mary P. Harper,et al.  Gesture patterns during speech repairs , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[2]  Yifan Gong,et al.  A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Wu-chun Feng,et al.  An integrated multimedia environment for speech recognition using handwriting and written gestures , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[4]  Elliot Saltzman,et al.  Gesture-based Dynamic Bayesian Network for noise robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Rafiqul Zaman Khan,et al.  Survey on Various Gesture Recognition Technologies and Techniques , 2012 .

[6]  J Vanden Berghe,et al.  Speech Recognition in Noise for Cochlear Implantees with a Two-Microphone Monaural Adaptive Noise Reduction System , 2001, Ear and hearing.

[7]  Min Jiang,et al.  Artificial robot navigation based on gesture and speech recognition , 2014, Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC).

[8]  P Prajith Investigations on the applications of dynamical instabilities and deterministic chaos for speech signal processing , 2008 .

[9]  Maycel Isaac Faraj,et al.  Lip motion features for biometric person recognition , 2009 .

[10]  Hervé Glotin,et al.  Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[11]  Jiri Matas,et al.  Combining Evidence in Multimodal Personal Identity Recognition Systems , 1997, AVBPA.

[12]  Satoshi Nakamura,et al.  Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[13]  Li Hui,et al.  An algorithm of skin detection based on texture , 2011, 2011 4th International Congress on Image and Signal Processing.

[14]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .