Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End

Missing data techniques (MDTs) have been widely employed and shown to improve speech recognition results under noisy conditions. This paper presents a new technique which improves upon previously proposed sparse imputation techniques relying on the least absolute shrinkage and selection operator (LASSO). LASSO is widely employed in compressive sensing problems. However, the problem with LASSO is that it does not satisfy oracle properties in the event of a highly collinear dictionary, which happens with features extracted from most speech corpora. When we say that a variable selection procedure satisfies the oracle properties, we mean that it enjoys the same performance as though the underlying true model is known. Through experiments on the Aurora 2.0 noisy spoken digits database, we demonstrate that the Least Angle Regression implementation of the Elastic Net (LARS-EN) algorithm is able to better exploit the properties of a collinear dictionary, and thus is significantly more robust in terms of basis selection when compared to LASSO on the continuous digit recognition task with estimated mask. In addition, we investigate the effects and benefits of a good measure of sparsity on speech recognition rates. In particular, we demonstrate that a good measure of sparsity greatly improves speech recognition rates, and that the LARS modification of LASSO and LARS-EN can be terminated early to achieve improved recognition results, even though the estimation error is increased.

[1]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[2]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[3]  Allen Y. Yang,et al.  Feature Selection in Face Recognition: A Sparse Representation Perspective , 2007 .

[4]  Jon Barker,et al.  LINKING AUDITORY SCENE ANALYSIS AND ROBUST ASR BY MISSING DATA TECHNIQUES , 2001 .

[5]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[6]  Tara N. Sainath,et al.  An analysis of sparseness and regularization in exemplar-based methods for speech classification , 2010, INTERSPEECH.

[7]  Anil C. Kokaram,et al.  A System for Reconstruction of Missing Data in Image Sequences Using Sampled 3D AR Models and MRF Motion Priors , 1996, ECCV.

[8]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[9]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[10]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[11]  Steve Young,et al.  The HTK book , 1995 .

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[14]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[15]  Ting Sun,et al.  Single-pixel imaging via compressive sampling , 2008, IEEE Signal Process. Mag..

[16]  M. Lustig,et al.  Compressed Sensing MRI , 2008, IEEE Signal Processing Magazine.

[17]  Bhaskar D. Rao,et al.  Sparse Bayesian learning for basis selection , 2004, IEEE Transactions on Signal Processing.

[18]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[19]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[20]  Bert Cranen,et al.  Using sparse representations for missing data imputation in noise robust speech recognition , 2008, 2008 16th European Signal Processing Conference.

[21]  Abeer Alwan,et al.  Utilizing Compressibility in Reconstructing Spectrographic Data, With Applications to Noise Robust ASR , 2009, IEEE Signal Processing Letters.

[22]  Jon Barker,et al.  Robust ASR based on clean speech models: an evaluation of missing data techniques for connected digit recognition in noise , 2001, INTERSPEECH.

[23]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[24]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[25]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[26]  E. Candès,et al.  People Hearing Without Listening : ” An Introduction To Compressive Sampling , 2007 .

[27]  Tara N. Sainath,et al.  Bayesian compressive sensing for phonetic classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Bert Cranen,et al.  Missing data imputation using compressive sensing techniques for connected digit recognition , 2009, 2009 16th International Conference on Digital Signal Processing.

[29]  Shrikanth S. Narayanan,et al.  Accelerated three‐dimensional upper airway MRI using compressed sensing , 2009, Magnetic resonance in medicine.

[30]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[31]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[32]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Hugo Van hamme,et al.  PROSPECT features and their application to missing data techniques for robust speech recognition , 2004, INTERSPEECH.

[34]  Bert Cranen,et al.  Sparse imputation for large vocabulary noise robust ASR , 2011, Comput. Speech Lang..

[35]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[36]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[37]  Hong Yan,et al.  Microarray missing data imputation based on a set theoretic framework and biological knowledge , 2006, Nucleic acids research.

[38]  Hugo Van hamme,et al.  Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition , 2010, IEEE Journal of Selected Topics in Signal Processing.

[39]  Abeer Alwan,et al.  MISSING FEATURE IMPUTATION OF LOG-SPECTRAL DATA FOR NOISE ROBUST ASR , 2009 .

[40]  Phil D. Green,et al.  State based imputation of missing data for robust speech recognition and speech enhancement , 1999, EUROSPEECH.

[41]  Bradley Efron,et al.  Missing Data, Imputation, and the Bootstrap , 1994 .

[42]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[43]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..