An improved sparse reconstruction algorithm for speech compressive sensing using structured priors

This work addresses the issue of sparse reconstruction in compressive sensing (CS) for speech signals. We propose a novel sparse reconstruction algorithm based on the approximate message passing (AMP) framework, via exploiting the intrinsic structures of real-life speech signals in the modified discrete cosine transform (MDCT) domain. We use a Gaussian mixture model to characterize the marginal distribution of the MDCT coefficients, and employ a first order Markov chain model to capture the inter-dependencies between neighboring MDCT coefficients. The parameters of these two models are adaptively learned using an expectation-maximization (EM) learning procedure. Compared with several state-of-the-art algorithms, the new algorithm showed significantly better performance in reconstruction experiments on real speech signals.

[1]  Armando Manduca,et al.  Highly Undersampled Magnetic Resonance Image Reconstruction via Homotopic $\ell_{0}$ -Minimization , 2009, IEEE Transactions on Medical Imaging.

[2]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[3]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[4]  DeLiang Wang,et al.  Segregation of unvoiced speech from nonspeech interference. , 2008, The Journal of the Acoustical Society of America.

[5]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[6]  Sundeep Rangan,et al.  Generalized approximate message passing for estimation with random linear mixing , 2010, 2011 IEEE International Symposium on Information Theory Proceedings.

[7]  Simon J. Godsill,et al.  Sparse Linear Regression With Structured Priors and Application to Denoising of Musical Audio , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Kush R. Varshney,et al.  Sparse Representation in Structured Dictionaries With Application to Synthetic Aperture Radar , 2008, IEEE Transactions on Signal Processing.

[9]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[13]  Philip Schniter,et al.  Turbo reconstruction of structured sparse signals , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[14]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[15]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Philippe Gournay,et al.  Unified speech and audio coding scheme for high quality at low bitrates , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Philip Schniter,et al.  Dynamic Compressive Sensing of Time-Varying Signals Via Approximate Message Passing , 2012, IEEE Transactions on Signal Processing.

[18]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[19]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .