Speech recognition with unknown partial feature corruption - a review of the union model

Abstract This paper provides a summary of our studies on robust speech recognition based on a new statistical approach – the probabilistic union model. We consider speech recognition given that part of the acoustic features may be corrupted by noise. The union model is a method for basing the recognition on the clean part of the features, thereby reducing the effect of the noise on recognition. To this end, the union model is similar to the missing feature method. However, the two methods achieve this end through different routes. The missing feature method usually requires the identity of the noisy data for noise removal, while the union model combines the local features based on the union of random events, to reduce the dependence of the model on information about the noise. We previously investigated the applications of the union model to speech recognition involving unknown partial corruption in frequency band, in time duration, and in feature streams. Additionally, a combination of the union model with conventional noise-reduction techniques was studied, as a means of dealing with a mixture of known or trainable noise and unknown unexpected noise. In this paper, a unified review, in the context of dealing with unknown partial feature corruption, is provided into each of these applications, giving the appropriate theory and implementation algorithms, along with an experimental evaluation.

[1]  Stéphane Dupont Missing data reconstruction for robust automatic speech recognition in the framework of hybrid HMM/ANN systems , 1998, ICSLP.

[2]  Bert Cranen,et al.  Acoustic pre-processing for optimal effectivity of missing feature theory , 1999, EUROSPEECH.

[3]  Richard M. Stern,et al.  Classifier-based mask estimation for missing feature methods of robust speech recognition , 2000, INTERSPEECH.

[4]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Hervé Bourlard,et al.  The full combination sub-bands approach to noise robust HMM/ANN based ASR , 1999, EUROSPEECH.

[6]  Phil D. Green,et al.  Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: an integrated study , 1999, EUROSPEECH.

[7]  Andrzej Drygajlo,et al.  Statistical estimation of unreliable features for robust speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Richard M. Stern,et al.  Inference of missing spectrographic features for robust speech recognition , 1998, ICSLP.

[9]  Jean-François Mari,et al.  A recombination model for multi-band speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[11]  Peter Jancovic,et al.  A multi-band approach based on the probabilistic union model and frequency-filtering features for robust speech recognition , 2001, INTERSPEECH.

[12]  Climent Nadeu,et al.  On the decorrelation of filter-bank energies in speech recognition , 1995, EUROSPEECH.

[13]  Misha Pavel,et al.  Towards ASR on partially corrupted speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Andrzej Drygajlo,et al.  Speaker verification in noisy environments with combined spectral subtraction and missing feature theory , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Khalid Daoudi,et al.  A new approach for multi-band speech recognition based on probabilistic graphical models , 2000, INTERSPEECH.

[16]  Jon Barker,et al.  Robust ASR based on clean speech models: an evaluation of missing data techniques for connected digit recognition in noise , 2001, INTERSPEECH.

[17]  Francis Jack Smith,et al.  A probabilistic union model for sub-band based robust speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Phil D. Green,et al.  Missing data techniques for robust speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Ji Ming An improved union model for continuous speech recognition with partial duration corruption , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[20]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[21]  Francis Jack Smith,et al.  Union: A new approach for combining sub-band observations for noisy speech recognition , 2001, Speech Commun..

[22]  Heidi Christensen,et al.  Employing heterogeneous information in a multi-stream framework , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[23]  Nikki Mirghafori,et al.  Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers , 1998, ICSLP.

[24]  Phil D. Green,et al.  State based imputation of missing data for robust speech recognition and speech enhancement , 1999, EUROSPEECH.

[25]  Alexandros Potamianos,et al.  Multi-band speech recognition in noisy environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[26]  Darryl Stewart,et al.  Modeling the mixtures of known noise and unknown unexpected noise for robust speech recognition , 2001, INTERSPEECH.

[27]  Richard Lippmann,et al.  Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise KN-37 , 1997, EUROSPEECH.

[28]  Hervé Bourlard,et al.  Using multiple time scales in a multi-stream speech recognition system , 1997, EUROSPEECH.

[29]  P Jancovic,et al.  A probabilistic union model with automatic order selection for noisy speech recognition. , 2001, The Journal of the Acoustical Society of America.

[30]  Darryl Stewart,et al.  Robust feature selection using probabilistic union models , 2000, INTERSPEECH.

[31]  Imre Kiss,et al.  Multi-resolution front-end for noise robust speech recognition , 2000, INTERSPEECH.