论文信息 - Accurate marginalization range for missing data recognition

Accurate marginalization range for missing data recognition

Abstract Missing data recognition has been proposed to increase noiserobustness of automatic speech recognition. This strategy reliesontheuseofaspectrographicmaskthatgivesinformationaboutthetruecleanspeechenergyofacorruptedsignal. Thisinforma-tion is then used to reﬁne the data process during the decodingstep. We propose in this work a new mask that provides moreinformation about the clean speech contribution than classicalmasks based on a Signal to Noise Ratio (SNR) thresholding.The proposed mask is described and compared to another miss-ing data approach based on SNR thresholding. Experimentalresults show a signiﬁcant word error rate reduction induced bythe proposed approach. Moreover, the proposed mask outper-forms the ETSI advanced front-end on the HIWIRE corpus. IndexTerms : robustspeechrecognition,missingdata,boundedmarginalization 1. Introduction The presence of background noise typically causes mismatchesbetween training and testing conditions, which signiﬁcantly de-gradetheperformanceofautomaticspeechrecognizers(ASRs).Over the last decades, many solutions to reduce the effect ofnoise have been proposed. Acoustic models can be adapted tonewnoisyconditions,theanalysisfront-endcanbemaderobustto noise, and noise reduction algorithms can be used as prepro-cessing stages.Although many of these methods have shown superior per-formance in noisy conditions compared to standard speechrecognition, noise robustness is still a challenging issue fornowadays speech recognizers, especially for non-sationarynoise.More recently, speech recognition with missing data hasbeen proposed. This technique relies on a clustering of spectralfeatures into two classes: time-frequency (T-F) units of a noisyspeechsignalthatcontainmorespeechenergythannoiseenergyare classiﬁed as reliable data, while T-F units containing morenoiseenergyareclassiﬁedasmissingdata. Hence, theresultingclustering produces a binary mask that is exploited in missingdata recognition techniques [1].

Jean Paul Haton | Christophe Cerisara | Sébastien Demange

[1] Petros Maragos,et al. Towards Speaker and Environmental Robustness in ASR: The HIWIRE Project , 2006 .

[2] Andrew C. Morris. Data utility modelling for mismatch reduction , 2001 .

[3] Naveen Parihar,et al. Analysis of the Aurora large vocabulary evaluations , 2003, INTERSPEECH.

[4] Jon Barker,et al. Soft decisions in missing data techniques for robust automatic speech recognition , 2000, INTERSPEECH.

[5] Jean Paul Haton,et al. Missing data mask models with global frequency and temporal constraints , 2006, INTERSPEECH.

[6] Richard M. Stern,et al. Reconstruction of incomplete spectrograms for robust speech recognition , 2000 .

[7] Jean Paul Haton,et al. On noise masking for automatic missing data speech recognition: A survey and discussion , 2007, Comput. Speech Lang..