Two-level optimization approach with accelerated proximal gradient for objective measures in sparse speech reconstruction

Compressive speech enhancement makes use of the sparseness of speech and the non-sparseness of noise in time-frequency representation to perform speech enhancement. However, reconstructing the sparsest output may not necessarily translate to a good enhanced speech signal as speech distortion may be at risk. This paper proposes a two level optimization approach to incorporate objective quality measures in compressive speech enhancement. The proposed method combines the accelerated proximal gradient approach and a global one dimensional optimization method to solve the sparse reconstruction. By incorporating objective quality measures in the optimization process, the reconstructed output is not only sparse but also maintains the highest objective quality score possible. In other words, the sparse speech reconstruction process is now quality sparse speech reconstruction. Experimental results in a compressive speech enhancement consistently show score improvement in objectives measures in different noisy environments compared to the non-optimized method. Additionally, the proposed optimization yields a higher convergence rate with a lower computational complexity compared to the existing methods.

[1]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[2]  Sven Nordholm,et al.  Accelerated gradient with optimal step size for second-order blind signal separation , 2018, Multidimens. Syst. Signal Process..

[3]  O. Burdakov,et al.  Stabilized Barzilai-Borwein Method , 2019, Journal of Computational Mathematics.

[4]  Siow Yong Low Compressive speech enhancement in the modulation domain , 2018, Speech Commun..

[5]  Kiyohiro Shikano,et al.  Musical-noise-free speech enhancement: Theory and evaluation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[7]  Siow Yong Low,et al.  Hyper-parameterization of sparse reconstruction for speech enhancement , 2018, Applied Acoustics.

[8]  Cong Fang,et al.  Accelerated First-Order Optimization Algorithms for Machine Learning , 2020, Proceedings of the IEEE.

[9]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Matteo Torcoli,et al.  An Improved Measure of Musical Noise Based on Spectral Kurtosis , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[11]  Svetha Venkatesh,et al.  Compressive speech enhancement , 2013, Speech Commun..

[12]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[13]  Pierre Comon,et al.  Using the proximal gradient and the accelerated proximal gradient as a canonical polyadic tensor decomposition algorithms in difficult situations , 2020, Signal Process..

[14]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[15]  Thomas Esch,et al.  Efficient musical noise suppression for speech enhancement system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Jacob Benesty,et al.  A Perspective on Single-Channel Frequency-Domain Speech Enhancement , 2011, A Perspective on Single-Channel Frequency-Domain Speech Enhancement.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19]  H. Nkansah Least squares optimization with L1-norm regularization , 2017 .

[20]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[21]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[23]  Wei-Ping Zhu,et al.  A compressive sensing method for noise reduction of speech and audio signals , 2011, 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS).

[24]  Antonio Cantoni,et al.  Interior point method for optimum zero-forcing beamforming with per-antenna power constraints and optimal step size , 2015, Signal Process..