Abstract Missing data recognition has been proposed to increase noiserobustness of automatic speech recognition. This strategy reliesontheuseofaspectrographicmaskthatgivesinformationaboutthetruecleanspeechenergyofacorruptedsignal. Thisinforma-tion is then used to refine the data process during the decodingstep. We propose in this work a new mask that provides moreinformation about the clean speech contribution than classicalmasks based on a Signal to Noise Ratio (SNR) thresholding.The proposed mask is described and compared to another miss-ing data approach based on SNR thresholding. Experimentalresults show a significant word error rate reduction induced bythe proposed approach. Moreover, the proposed mask outper-forms the ETSI advanced front-end on the HIWIRE corpus. IndexTerms : robustspeechrecognition,missingdata,boundedmarginalization 1. Introduction The presence of background noise typically causes mismatchesbetween training and testing conditions, which significantly de-gradetheperformanceofautomaticspeechrecognizers(ASRs).Over the last decades, many solutions to reduce the effect ofnoise have been proposed. Acoustic models can be adapted tonewnoisyconditions,theanalysisfront-endcanbemaderobustto noise, and noise reduction algorithms can be used as prepro-cessing stages.Although many of these methods have shown superior per-formance in noisy conditions compared to standard speechrecognition, noise robustness is still a challenging issue fornowadays speech recognizers, especially for non-sationarynoise.More recently, speech recognition with missing data hasbeen proposed. This technique relies on a clustering of spectralfeatures into two classes: time-frequency (T-F) units of a noisyspeechsignalthatcontainmorespeechenergythannoiseenergyare classified as reliable data, while T-F units containing morenoiseenergyareclassifiedasmissingdata. Hence, theresultingclustering produces a binary mask that is exploited in missingdata recognition techniques [1].
[1]
Petros Maragos,et al.
Towards Speaker and Environmental Robustness in ASR: The HIWIRE Project
,
2006
.
[2]
Andrew C. Morris.
Data utility modelling for mismatch reduction
,
2001
.
[3]
Naveen Parihar,et al.
Analysis of the Aurora large vocabulary evaluations
,
2003,
INTERSPEECH.
[4]
Jon Barker,et al.
Soft decisions in missing data techniques for robust automatic speech recognition
,
2000,
INTERSPEECH.
[5]
Jean Paul Haton,et al.
Missing data mask models with global frequency and temporal constraints
,
2006,
INTERSPEECH.
[6]
Richard M. Stern,et al.
Reconstruction of incomplete spectrograms for robust speech recognition
,
2000
.
[7]
Jean Paul Haton,et al.
On noise masking for automatic missing data speech recognition: A survey and discussion
,
2007,
Comput. Speech Lang..