Enhanced posteriors bias prediction for robust multi-stream ASR combining voicing and estimate reliabilities

We discuss the fusion of speech and phoneme estimate reliabilities in a multi-stream Automatic Speech Recognizer (ASR) to improve ASR robustness. The Full Combination approach (FC) proposes to decompose the full-band posterior probability for each phoneme into a reliability weighted sum of stream posteriors' combinations. Previous studies show that weighting factors in FC should take in account not only speech signal reliability, but also the intrinsic efficiency of subband experts. To control these two variables for each combination of posteriors we derive a new model called “Posteriors Bias Prediction” (PBP) inspired by the Shannon Correction system. We show that FC is a specific type of PBP, and that PBP allows the integration of stream reliability based on of the voicing level R (Correlated with the Signal to Noise Ratio) and the phoneme's class. Tests on telephonic free digits (Numbers95) under various noise and SNR level demonstrate that PBP- outperforms FC, Jrasta or Spectral Subtraction methods.

[1]  Hervé Glotin,et al.  Test of several external posterior weighting functions for multiband full combination ASR , 2000, INTERSPEECH.

[2]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[3]  Andrew C. Morris,et al.  Comparison of HMM experts with MLP experts in the full combination multi-band approach to robust ASR , 2000, INTERSPEECH.

[4]  Hervé Glotin,et al.  Weighting schemes for audio-visual fusion in speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Christophe Ris,et al.  Assessing local noise level estimation methods: Application to noise robust ASR , 2000, Speech Commun..

[6]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[7]  Hervé Glotin,et al.  A new SNR-feature mapping for robust multistream speech recognition , 1999 .

[8]  Ronald A. Cole,et al.  New telephone speech corpora at CSLU , 1995, EUROSPEECH.

[9]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Hervé Glotin,et al.  Multi-stream adaptive evidence combination for noise robust ASR , 2001, Speech Commun..

[11]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[12]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..