Sparse representation-based quasi-clean speech construction for speech quality assessment under complex environments

A non-intrusive speech quality assessment method for complex environments was proposed. In the proposed approach, a new sparse representation-based speech reconstruction algorithm was presented to acquire the quasi-clean speech from the noisy degraded signal. Firstly, an over-complete dictionary of the clean speech power spectrum was learned by the K-singular value decomposition algorithm. Then in the sparse representation stage, the stopping residue error was adaptively achieved according to the estimated cross-correlation and the noise spectrum which was adjusted by a posteriori SNR-weighted factor, and the orthogonal matching pursuit approach was applied to reconstruct the clean speech spectrum from the noisy speech. The quasi-clean speech was considered as the reference to a modified PESQ perceptual model, and the mean opinion score of the noisy degraded speech was achieved via the distortions estimation between the quasi-clean speech and the degraded speech. Experimental results show that the proposed approach obtains a correlation coefficient of 0.925 on NOIZEUS complex environment database, which is 99% similar to the performance of the intrusive standard ITU-T PESQ, and 7.1% outperforms non-intrusive standard ITU-T P.563.

[1]  Roberto Togneri,et al.  Inverse synthetic aperture radar imaging based on sparse signal processing , 2011 .

[2]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[3]  Brendt Wohlberg,et al.  Efficient Algorithms for Convolutional Sparse Representations , 2016, IEEE Transactions on Image Processing.

[4]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[5]  Arun Kumar,et al.  Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech , 2015, IET Signal Process..

[6]  Jiqing Han,et al.  A solution to residual noise in speech denoising with sparse representation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Ling-Hua Chang,et al.  An Improved RIP-Based Performance Guarantee for Sparse Signal Recovery via Orthogonal Matching Pursuit , 2014, IEEE Trans. Inf. Theory.

[8]  Yang Zhen Speech Enhancement Based on Data-Driven Dictionary and Sparse Representation , 2011 .

[9]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Bo Wang,et al.  A Speech Enhancement Method Employing Sparse Representation of Power Spectral Density , 2013 .

[11]  Joachim M. Buhmann,et al.  Speech Enhancement Using Generative Dictionary Learning , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Weisi Lin,et al.  Non-intrusive Speech Quality Assessment with Support Vector Regression , 2010, MMM.

[13]  Nicholas W. D. Evans,et al.  An Assessment on the Fundamental Limitations of Spectral Subtraction , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Stefano Cosentino,et al.  Non-intrusive objective speech quality and intelligibility prediction for hearing instruments in complex listening environments , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Shenghui Zhao,et al.  Mapping methods for output-based objective speech quality assessment using data mining , 2014 .

[16]  Qin Jiwei Objective Evaluation Method of Speech Quality Based on Auditory Perceptual Properties , 2013 .

[17]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[18]  Koichi Shinoda,et al.  Feature normalization based on non-extensive statistics for speech recognition , 2013, Speech Commun..

[19]  Sebastian Möller,et al.  Advances in Perceptual Modeling of Speech Quality in Telecommunications , 2014, ITG Symposium on Speech Communication.

[20]  Qianhua He,et al.  Non-intrusive speech quality objective evaluation in high-noise environments , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[21]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[22]  Abhijit Karmakar,et al.  A Multiresolution Model of Auditory Excitation Pattern and Its Application to Objective Evaluation of Perceived Speech Quality , 2006, IEEE Transactions on Audio, Speech, and Language Processing.