Data utility modelling for mismatch reduction

In the "missing data" (MD) approach to noise robust automatic speech recognition (ASR), speech models are trained on clean data, and during recognition sections of spectral data dominated by noise are detected and treated as "missing". However, this all-or-nothing hard decision about which data is missing does not accurately reflect the probabilistic nature of missing data detection. Recent work has shown greatly improved performance by the "soft missing data" (SMD) approach, in which the "missing" status of each data value is represented by a continuous probability rather than a 0/1 value. This probability is then used to weight between the different likelihood contributions which the MD model normally assigns to each spectral observation according to its "missing" status. This article presents an analysis which shows that the SMD approach effectively implements a Maximum A-Posteriori (MAP) decoding strategy with missing or uncertain data, subject to the interpretation that the missing/not-missing probabilities are weights for a mixture pdf which models the pdf for each hidden clean data input, after conditioning by the noisy data input, a local noise estimate, and any information which may be available. An important feature of this "soft data" model is that control over the "evidence pdf" can provide a principled framework not only for ignoring unreliable data, but also for focusing attention on more discriminative features, and for data enhancement.

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[3]  Harvey Fletcher,et al.  The nature of speech and its interpretation , 1922 .

[4]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[5]  Andrew C. Morris,et al.  Phoneme transition detection and broad classification using a simple model based on the function of onset detector cells found in the cochlear nucleus , 1995, EUROSPEECH.

[6]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[7]  Gang Feng,et al.  A reliability criterion for time-frequency labeling based on periodicity in an auditory scene , 1999, EUROSPEECH.

[8]  Hervé Bourlard,et al.  From missing data to maybe useful data: soft data modelling for noise robust ASR , 2001 .

[9]  Mounir El-Maliki Speaker verification with missing features in noisy environments , 2000 .

[10]  Richard P. Lippmann,et al.  ROBUST SPEECH RECOGNITION WITH INTERRUPTIONS, AND NOISE:': , 1997 .

[11]  Volker Tresp,et al.  Some Solutions to the Missing Feature Problem in Vision , 1992, NIPS.

[12]  Ariel Salomon,et al.  Detection of speech landmarks using temporal cues , 2000, INTERSPEECH.

[13]  Jon Barker,et al.  Soft decisions in missing data techniques for robust automatic speech recognition , 2000, INTERSPEECH.

[14]  H. Fletcher The nature of speech and its interpretation , 1922 .

[15]  Hans-Günter Hirsch,et al.  Noise estimation techniques for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[17]  David G. Stork,et al.  Pattern Classification , 1973 .

[18]  Phil D. Green,et al.  Some solution to the missing feature problem in data classification, with application to noise robust ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).