From missing data to maybe useful data: soft data modelling for noise robust ASR

Much research has been focused on the problem of achieving automatic speech recognition (ASR) which approaches human recognition performance in its level of robustness to noise and channel distortion. We present here a new approach to data modelling which has the potential to combine complementary existing state-of-the-art techniques for speech enhancement and noise adaptation into a single process. In the "missing feature theory" (MFT) based approach to noise robust ASR, misinformative spectral data is detected and then ignored. Recent work has shown that MFT ASR greatly improves when the usual hard decision to exclude data features is softened by a continuous weighting between the likelihood contributions normally used with MFT for "clean" and "missing" data. The new model presented here can be seen as a generalisation of this "soft missing data" approach, in which the mixture pdf which is implicitly used to model clean or missing observation data is recognised as the data posterior pdf, and modelled accordingly. Initial "soft data" experiments compare the performance of different soft missing data models against baseline Gaussian mixture HMM performance. The test used is the Aurora 2.0 task for speaker independent continuous digits recognition.

[1]  Hans-Günter Hirsch,et al.  Noise estimation techniques for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  G. Deco,et al.  An Information-Theoretic Approach to Neural Computing , 1997, Perspectives in Neural Computing.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[5]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[6]  S. Hanson,et al.  Some Solutions to the Missing Feature Problem in Vision , 1993 .

[7]  Phil D. Green,et al.  Some solution to the missing feature problem in data classification, with application to noise robust ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[9]  Gang Feng,et al.  A reliability criterion for time-frequency labeling based on periodicity in an auditory scene , 1999, EUROSPEECH.

[10]  Misha Pavel,et al.  Towards ASR on partially corrupted speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[12]  Volker Tresp,et al.  Some Solutions to the Missing Feature Problem in Vision , 1992, NIPS.

[13]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Richard Lippmann,et al.  Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise KN-37 , 1997, EUROSPEECH.

[15]  Ariel Salomon,et al.  Detection of speech landmarks using temporal cues , 2000, INTERSPEECH.

[16]  Jon Barker,et al.  Soft decisions in missing data techniques for robust automatic speech recognition , 2000, INTERSPEECH.

[17]  Mounir El-Maliki Speaker verification with missing features in noisy environments , 2000 .

[18]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[19]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20]  Andrew C. Morris,et al.  Phoneme transition detection and broad classification using a simple model based on the function of onset detector cells found in the cochlear nucleus , 1995, EUROSPEECH.

[21]  Harvey Fletcher,et al.  The nature of speech and its interpretation , 1922 .

[22]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.