An Adaptive Methodology for Ubiquitous ASR System

Achieving and maintaining the performance of ubiquitous (Automatic Speech Recognition) ASR system is a real challenge. The main objective of this work is to develop a method that will improve and show the consistency in performance of ubiquitous ASR system for real world noisy environment. An adaptive methodology has been developed to achieve an objective with the help of implementing followings, Cleaning speech signal as much as possible while preserving originality / intangibility using various modified filters and enhancement techniques. Extracting features from speech signals using various sizes of parameter. Train the system for ubiquitous environment using multi-environmental adaptation training methods. Optimize the word recognition rate with appropriate variable size of parameters using fuzzy technique. The consistency in performance is tested using standard noise databases as well as in real world environment. A good improvement is noticed. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI) using Speech User Interface (SUI).

[1]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[2]  Yi Hu,et al.  Subjective Comparison of Speech Enhancement Algorithms , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Lou Boves,et al.  Comparison of channel normalisation techniques for automatic speech recognition over the phone , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[5]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition : methods that search for structures in data , 1992 .

[6]  Urmila Shrawankar,et al.  Feature Extraction for a Speech Recognition System in Noisy Environment: A Study , 2010, 2010 Second International Conference on Computer Engineering and Applications.

[7]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Régine Le Bouquin-Jeannès,et al.  Towards a New Reference Impairment System in the Subjective Evaluation of Speech Codecs , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Yang Lu,et al.  Speech enhancement by combining statistical estimators of speech and noise , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[11]  Rainer Martin,et al.  Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Urmila Shrawankar,et al.  Parameters Optimization for Improving ASR Performance in Adverse Real World Noisy Environmental Conditions , 2013, ArXiv.

[13]  Joseph Sylvester Chang,et al.  A parametric formulation of the generalized spectral subtraction method , 1998, IEEE Trans. Speech Audio Process..

[14]  Juan Manuel Górriz,et al.  Voice Activity Detection. Fundamentals and Speech Recognition System Robustness , 2007 .

[15]  Urmila Shrawankar,et al.  Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment , 2010, Intelligent Information Processing.

[16]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[17]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[18]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[19]  Israel Cohen Modeling speech signals in the time-frequency domain using GARCH , 2004, Signal Process..

[20]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Richard J. Povinelli,et al.  Minimum Mean-Squared Error Estimation of Mel-Frequency Cepstral Coefficients Using a Novel Distortion Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Philipos C. Loizou,et al.  SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech , 2011, Speech Commun..

[24]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[25]  Urmila Shrawankar,et al.  Voice Activity Detector and Noise Trackers for Speech Recognition System in Noisy Environment , 2010, Int. J. Adv. Comp. Techn..

[26]  Sergio Cruces,et al.  A Novel LMS Algorithm Applied to Adaptive Noise Cancellation , 2009, IEEE Signal Processing Letters.

[27]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[28]  Ehud Weinstein,et al.  Iterative and sequential Kalman filter-based speech enhancement algorithms , 1998, IEEE Trans. Speech Audio Process..

[29]  D. O'Shaughnessy,et al.  Using noise reduction and spectral emphasis techniques to improve ASR performance in noisy conditions , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[30]  Abeer Alwan,et al.  On the use of variable frame rate analysis in speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).