Utilizing Compressibility in Reconstructing Spectrographic Data, With Applications to Noise Robust ASR

In this letter, we propose a novel algorithm for reconstructing unreliable spectrographic data, a method applicable to missing feature-based automatic speech recognition (ASR). We provide quantitative analysis illustrating the high compressibility of spectrographic speech data. The existence of sparse representations for spectrographic data motivates the spectral reconstruction solution to be posed as an optimization problem minimizing the lscr1-norm. When applied to the Aurora-2 database, the proposed missing feature estimation algorithm is shown to provide significant improvements in recognition accuracy relative to the baseline MFCC system. Even without an oracle mask, performance approaches that of the ETSI advanced front end (AFE) , with less complexity.

[1]  Abeer Alwan,et al.  A model of dynamic auditory perception and its application to robust word recognition , 1997, IEEE Trans. Speech Audio Process..

[2]  Hugo Van hamme,et al.  A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition , 2007, EURASIP J. Adv. Signal Process..

[3]  J. Romberg,et al.  Imaging via Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[4]  Anil K. Jain Fundamentals of Digital Image Processing , 2018, Control of Color Imaging Systems.

[5]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[6]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[7]  D. Pearce Enabling new speech driven services for mobile devices: an overview of the proposed etsi standard for a distributed speech recognition front-end , 1999 .

[8]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[9]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[10]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[11]  R.M. Stern,et al.  Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.

[12]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..