An energy search approach to variable frame rate front-end processing for robust ASR

Extensive research has been devoted to robustness in the presence of various types and degrees of environmental noise over the past several years, however this remains one of the main problems facing automatic speech recognition systems. This paper describes a new variable frame rate analysis technique, based upon searching a predefined lookahead interval for the next frame position that maximizes the firstorder difference of the log energy (ΔE) between the consecutive frames. The application of this novel technique to noise-robust ASR front-end processing is also reported. In comparison with existing variable frame rate methods in the literature, the proposed energy search approach is simpler and achieves similar recognition accuracy improvements at lower complexity. Experimental work on the Aurora II connected digits database reveals that the proposed front-end, together with cumulative distribution mapping, achieves average digit recognition accuracies of 78.32% for a model set trained from clean data and 89.95% for a model set trained from data with multiple noise conditions, representing 6.1% and 2.3% reductions in word error rates respectively over a cumulative distribution mapping baseline.

[1]  Javier Macías Guarasa,et al.  Revisiting scenarios and methods for variable frame rate analysis in automatic speech recognition , 2003, INTERSPEECH.

[2]  Eric H. C. Choi Noise Robust Front-end for ASR using Spectral Subtraction , Spectral Flooring and Cumulative Distribution Mapping , 2004 .

[3]  Abeer Alwan,et al.  Entropy-based variable frame rate analysis of speech signals and its application to ASR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[5]  P Le Cerf,et al.  A new variable frame analysis method for speech recognition , 1994 .

[6]  Yariv Ephraim,et al.  A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..

[7]  Jay G. Wilpon,et al.  Discriminative analysis for feature reduction in automatic speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Zhipeng Zhang,et al.  Piecewise-linear transformation-based HMM adaptation for noisy speech , 2004, Speech Commun..

[9]  S. M. Peeling,et al.  The use of variable frame rate analysis in speech recognition , 1991 .

[10]  Abeer Alwan,et al.  On the use of variable frame rate analysis in speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).