A new online Bayesian NMF based quasi-clean speech reconstruction for non-intrusive voice quality evaluation

Abstract Voice quality evaluation under complex environments is an important part of Quality of Service. Recently, the non-intrusive evaluation is a challenging problem and is getting more and more attentive. Since the traditional non-intrusive evaluation has no knowledge of the original clean speech, it is expected to be underperformed the intrusive one. In this paper, a new non-intrusive method based on quasi-clean speech reconstruction and intrusive model is proposed. To obtain the quasi-clean speech, a new online Bayesian non-negative matrix factorization (NMF) based speech reconstruction algorithm is presented. The noise basis matrix is updated utilizing the noise frames from the online noisy observation, and the quasi-clean speech is reconstructed using the Bayesian NMF in combination of speech activity probability. The final reconstructed signal is regarded as the reference of the modified Perceptual Evaluation of Speech Quality (PESQ) model to achieve the noisy speech quality. The experiment results show that the proposed method obtains a 0.895 correlation on NOIZEUS and ITU-T P-series Supplement 23 database, which is 10.1% outperforms non-intrusive standard ITU-T P.563.

[1]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[2]  Yang Hao,et al.  An NMF-Based Method for the Fingerprint Orientation Field Estimation , 2012 .

[3]  Gang Wei,et al.  Quasi-clean Speech Construction Based Speech Quality Evaluation under Complex Environments , 2015, 2015 IEEE International Conference on Systems, Man, and Cybernetics.

[4]  Sven Nordholm,et al.  Speech enhancement strategy for speech recognition microcontroller under noisy environments , 2013, Neurocomputing.

[5]  Arun Kumar,et al.  Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech , 2015, IET Signal Process..

[6]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[7]  Yanxiong Li,et al.  Sparse representation-based quasi-clean speech construction for speech quality assessment under complex environments , 2017, IET Signal Process..

[8]  Gholamreza Akbarizadeh,et al.  Efficient Combination of Texture and Color Features in a New Spectral Clustering Method for PolSAR Image Segmentation , 2017 .

[9]  Gholamreza Akbarizadeh,et al.  Coastline extraction from SAR images using spatial fuzzy clustering and the active contour method , 2017 .

[10]  Gholamreza Akbarizadeh,et al.  A Two-Phase Algorithm Based on Kurtosis Curvelet Energy and Unsupervised Spectral Regression for Segmentation of SAR Images , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Hanseok Ko,et al.  Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities , 2017 .

[13]  Stefano Cosentino,et al.  Non-intrusive objective speech quality and intelligibility prediction for hearing instruments in complex listening environments , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16]  Qin Jiwei Objective Evaluation Method of Speech Quality Based on Auditory Perceptual Properties , 2013 .

[17]  J. Larsen,et al.  Reduction of non-stationary noise using a non-negative latent variable decomposition , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[18]  Qian Song,et al.  Polarimetric SAR Target Decomposition based on sparse NMF , 2016, 2016 Progress in Electromagnetic Research Symposium (PIERS).

[19]  Israel Cohen,et al.  Relaxed statistical model for speech enhancement and a priori SNR estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[20]  Binbin Pan,et al.  Supervised kernel nonnegative matrix factorization for face recognition , 2016, Neurocomputing.

[21]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[22]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Qianhua He,et al.  Non-intrusive speech quality objective evaluation in high-noise environments , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[24]  Yiming Pi,et al.  The feasibility analysis of applying NMF in SAR target recognition , 2015, 2015 IEEE International Conference on Digital Signal Processing (DSP).

[25]  Shenghui Zhao,et al.  Mapping methods for output-based objective speech quality assessment using data mining , 2014 .

[26]  Gholamreza Akbarizadeh,et al.  A New Statistical-Based Kurtosis Wavelet Energy Feature for Texture Recognition of SAR Images , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[27]  James M. Kates,et al.  Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices: Advantages and limitations of existing tools , 2015, IEEE Signal Processing Magazine.

[28]  Arne Leijon,et al.  Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Jia-Ching Wang,et al.  Improving iris image segmentation in unconstrained environments using NMF-based approach , 2016, 2016 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW).

[30]  Sebastian Möller,et al.  Advances in Perceptual Modeling of Speech Quality in Telecommunications , 2014, ITG Symposium on Speech Communication.

[31]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[32]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.