Approach of features with confident weight for robust speech recognition

The enhancement of speech has become one of the focuses of automatic speech recognition (ASR) development. In recent studies, the missing feature approach (MFA) has been proved to be a suitable method. However the hard mask decision in the MFA is mostly a rough binary classifier on the basis of a certain threshold value that could cause a failed decision of reliability and result in a signal screening risk. As improvements of the hard mask the effectiveness of soft masks, including soft mask works with a Bayesian classifier, attempt to compensate the loss of real speech in the hard mask decision by discovering the probability density function (p.d.f.) of the unreliable feature component. Unfortunately, this is a very difficult task because of the overlap of at least two complex random processes. The sigmoid function suggested by some soft masks is not a reasonable p.d.f. In this paper, we provide an analysis of the confident degree of a feature component in a subband based on four criteria and then propose four types of confident weight (CWs). Based on CWs, we introduce four classes of approaches of feature with confident weight (AFCWs), which estimate the confidence degree of each feature vector simply and efficiently, describe the effect of noise in a rigorous manner, and eliminate the risk of selecting thresholds and the difficulty of finding a joint p.d.f. of reliable and unreliable components. Experimental results have shown that the proposed approaches improve the performances of ASR systems even in an adverse environment.

[1]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[2]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[3]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[4]  Louis ten Bosch,et al.  Modelling pronunciation variation with single-path and multi-path syllable models: Issues to consider , 2009, Speech Commun..

[5]  Jon Barker,et al.  Soft decisions in missing data techniques for robust automatic speech recognition , 2000, INTERSPEECH.

[6]  Neil D. Lawrence,et al.  Model-driven detection of clean speech patches in noise , 2007, INTERSPEECH.

[7]  Phil D. Green,et al.  Handling missing data in speech recognition , 1994, ICSLP.

[8]  Jean Paul Haton,et al.  On noise masking for automatic missing data speech recognition: A survey and discussion , 2007, Comput. Speech Lang..

[9]  Richard M. Stern,et al.  A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition , 2004, Speech Commun..

[10]  Ramón Fernández Astudillo,et al.  Uncertainty Propagation for Speech Recognition using RASTA Features in Highly Nonstationary Noisy Environments , 2011 .

[11]  Ljubomir Josifovski,et al.  Robust Automatic Speech Recognition with Missing and Unreliable Data , 2003 .

[12]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[13]  Katsuhiko Shirai,et al.  Approach of feature with confident weight for robust speech recognition , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[14]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[15]  DeLiang Wang,et al.  On the optimality of ideal binary time-frequency masks , 2009, Speech Commun..

[16]  Rainer Martin,et al.  SPEECH ENHANCEMENT IN THE DFT DOMAIN USING LAPLACIAN SPEECH PRIORS , 2003 .

[17]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .

[18]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[19]  Hans-Günter Hirsch,et al.  Noise estimation techniques for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  Masafumi Nishimura,et al.  Acoustic Model Adaptation Using First-Order Linear Prediction for Reverberant Speech , 2006, IEICE Trans. Inf. Syst..