Missing-Data Techniques: Feature Reconstruction

Automatic speech recognition (ASR) performance degrades rapidly when speech is corrupted with increasing levels of noise. Missing data techniques (MDT) constitute a family of methods that tackle noise robust speech recognition based on the so called missing data assumption proposed in [1]. MDTs assume that (i) the noisy speech signal can be divided in speech-dominated (reliable) and noise-dominated (unreliable) spectro-temporal components prior to decoding and (ii) the unreliable elements do not retain any information about the corresponding clean speech values. This means that the clean speech values corresponding to noise-dominated components are effectively missing, and speech recognition must proceed with partially observed data. Techniques for speech recognition with missing features divide in roughly two categories, marginalization and feature reconstruction. The marginalization approach, discussed in Chapter ??, is based on disregarding the missing components when calculating acoustic model likelihoods: likelihoods that correspond to the missing components are calculated by integrating over the full range of possible missing feature values [2, 3]. In this chapter, we focus on the reconstruction approach, where the missing values are substituted (imputed) with clean speech estimates prior to calculating the acoustic model likelihoods [4, 5, 6]. Since the reconstructed features do not contain any missing data, likelihood calculation does not need to be modified. In general, all missing feature imputation methods employ a model of the clean speech to estimate the missing values. The models range from simple smoothness assumptions [6] to advanced statistical models and exemplar-based approaches, although the acoustic models employed by the recognizer may also be used. Given the clean speech model and a noisy observation, the missing features are estimated as the values that best match the assumptions of clean speech components at the missing locations.

[1]  Bert Cranen,et al.  Using sparse representations for missing data imputation in noise robust speech recognition , 2008, 2008 16th European Signal Processing Conference.

[2]  Hugo Van hamme,et al.  Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[4]  DeLiang Wang,et al.  A Supervised Learning Approach to Uncertainty Decoding for Robust Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  DeLiang Wang,et al.  Robust speech recognition using multiple prior models for speech reconstruction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Richard M. Stern,et al.  On tracking noise with linear dynamical system models , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Daniel P. W. Ellis,et al.  Towards single-channel unsupervised source separation of speech mixtures: the layered harmonics/formants separation-tracking model , 2004, SAPA@INTERSPEECH.

[8]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[9]  Phil D. Green,et al.  Missing data techniques for robust speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  DeLiang Wang,et al.  Transforming Binary Uncertainties for Robust Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Hugo Van hamme,et al.  Handling convolutional noise in missing data automatic speech recognition , 2006, INTERSPEECH.

[12]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[13]  Bert Cranen,et al.  Sparse imputation for noise robust speech recognition using soft masks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Bert Cranen,et al.  Sparse imputation for large vocabulary noise robust ASR , 2011, Comput. Speech Lang..

[15]  Richard M. Stern,et al.  Reconstruction of incomplete spectrograms for robust speech recognition , 2000 .

[16]  Phil D. Green,et al.  State based imputation of missing data for robust speech recognition and speech enhancement , 1999, EUROSPEECH.

[17]  Hirokazu Kameoka,et al.  Computational auditory induction as a missing-data model-fitting problem with Bregman divergence , 2011, Speech Commun..

[18]  Hugo Van hamme,et al.  Automatic Speech Recognition Using Missing Data Techniques: Handling of Real-World Data , 2011, Robust Speech Recognition of Uncertain or Missing Data.

[19]  Mikko Kurimo,et al.  Missing-Feature Reconstruction With a Bounded Nonlinear State-Space Model , 2011, IEEE Signal Processing Letters.

[20]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[21]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[22]  Phil D. Green,et al.  Handling missing data in speech recognition , 1994, ICSLP.

[23]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[24]  Jean Paul Haton,et al.  On noise masking for automatic missing data speech recognition: A survey and discussion , 2007, Comput. Speech Lang..

[25]  Hugo Van hamme Robust speech recognition using cepstral domain missing data techniques and noisy masks , 2004, ICASSP.

[26]  Yoshihiko Nankaku,et al.  GMM-Based Missing-Feature Reconstruction on Multi-Frame Windows , 2011, INTERSPEECH.

[27]  Friedrich Faubel,et al.  Overcoming the Vector Taylor Series Approximation in Speech Feature Enhancement - A Particle Filter Approach , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[28]  Abeer Alwan,et al.  Utilizing Compressibility in Reconstructing Spectrographic Data, With Applications to Noise Robust ASR , 2009, IEEE Signal Processing Letters.

[29]  Andrzej Drygajlo,et al.  Speaker verification in noisy environments with combined spectral subtraction and missing feature theory , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[30]  Jon Barker,et al.  Robust ASR based on clean speech models: an evaluation of missing data techniques for connected digit recognition in noise , 2001, INTERSPEECH.

[31]  Juha Karhunen,et al.  State Inference in Variational Bayesian Nonlinear State-Space Models , 2006, ICA.

[32]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[33]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[34]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[35]  Mark J. F. Gales,et al.  Transforming features to compensate speech recogniser models for noise , 2009, INTERSPEECH.

[36]  Friedrich Faubel,et al.  BASED SOFT-MASK ESTIMATION FOR MISSING FEATURE RECONSTRUCTION , 2008 .

[37]  Hirokazu Kameoka,et al.  Computational auditory induction by missing-data non-negative matrix factorization , 2008, SAPA@INTERSPEECH.

[38]  Daniel D. Lee,et al.  Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines , 2002, NIPS.

[39]  Shrikanth Narayanan,et al.  Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Hugo Van hamme,et al.  Advances in Missing Feature Techniques for Robust Large-Vocabulary Continuous Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  H. Van hamme,et al.  Robust speech recognition using cepstral domain missing data techniques and noisy masks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42]  Hugo Van hamme,et al.  Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition , 2010, IEEE Journal of Selected Topics in Signal Processing.

[43]  Juha Häkkinen,et al.  On the Use of Missing Feature Theory with Cepstral Features , 2022 .

[44]  Abeer Alwan,et al.  HMM-Based Reconstruction of Unreliable Spectrographic Data for Noise Robust Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[45]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[46]  Hugo Van hamme Robust speech recognition using missing feature theory in the cepstral or LDA domain , 2003, INTERSPEECH.

[47]  Paris Smaragdis,et al.  Missing data imputation for spectral audio signals , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[48]  Jon Barker,et al.  Soft decisions in missing data techniques for robust automatic speech recognition , 2000, INTERSPEECH.

[49]  B. Raj,et al.  Reconstructing spectral vectors with uncertain spectrographic masks for robust speech recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[50]  Friedrich Faubel,et al.  Bounded conditional mean imputation with Gaussian mixture models: A reconstruction approach to partly occluded features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[51]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[52]  Tuomas Virtanen,et al.  Noise robust exemplar-based connected digit recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[53]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[54]  Trevor Hastie,et al.  Imputing Missing Data for Gene Expression Arrays , 2001 .

[55]  Ulpu Remes,et al.  Observation uncertainty measures for sparse imputation , 2010, INTERSPEECH.

[56]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[57]  Matthew Brand,et al.  Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[58]  Naveen Parihar,et al.  Analysis of the Aurora large vocabulary evaluations , 2003, INTERSPEECH.

[59]  Hugo Van hamme,et al.  PROSPECT features and their application to missing data techniques for robust speech recognition , 2004, INTERSPEECH.

[60]  Bert Cranen,et al.  Missing data imputation using compressive sensing techniques for connected digit recognition , 2009, 2009 16th International Conference on Digital Signal Processing.

[61]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.