High-likelihood model based on reliability statistics for robust combination of features: application to noisy speech recognition

This paper introduces a novel statistical approach for combination of multiple features, assuming no knowledge about the identity of the noisy features. In a given set of features, some of the features may be dominated by noise. The proposed model deals with the uncertainty about the noisy features by deriving the joint probability of a subset of features with highest probabilities. The core of the model lies in the determination the number of features to be included in the feature-subset – this is estimated based on calculating the reliability of each feature, which is defined as its normalized probability, and evaluating the joint maximal reliability. For the evaluation, we used the TIDIGITS database for connected digit recognition. The utterances were corrupted by various types of additive noise, which resulted the number and identity of the noisy features varied over time (or changed suddenly). The experimental results show that the high-likelihood model achieves recognition performance similar to the one obtained with a full a-priori knowledge about the identity of the noisy features.

[1]  Lou Boves,et al.  Acoustic backing-off as an implementation of missing feature theory , 2001, Speech Commun..

[2]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[3]  Peter Jancovic,et al.  A multi-band approach based on the probabilistic union model and frequency-filtering features for robust speech recognition , 2001, INTERSPEECH.

[4]  Francis Jack Smith,et al.  Robust speech recognition using probabilistic union models , 2002, IEEE Trans. Speech Audio Process..

[5]  Fionn Murtagh,et al.  Reliability-based estimation of the number of noisy features: application to model-order selection in the union models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Climent Nadeu,et al.  Time and frequency filtering of filter-bank energies for robust HMM speech recognition , 2000, Speech Commun..

[7]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Francis Jack Smith,et al.  Union: A new approach for combining sub-band observations for noisy speech recognition , 2001, Speech Commun..

[9]  Richard Lippmann,et al.  Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise KN-37 , 1997, EUROSPEECH.