An Instrumental Quality Measure for Artificially Bandwidth-Extended Speech Signals

Various studies have shown that the instrumental measures wideband PESQ and POLQA are not reliably predicting speech quality for artificial speech bandwidth extension (ABE) test conditions, as this has never been their scope. Based on data from a coordinated subjective listening test with 12 ABE variants developed by 6 different institutions, conducted in 4 languages, we propose in this work a novel instrumental quality measure that is specifically suited for narrowband-to-wideband ABE test conditions. In particular, our contributions are fourfold: First, we propose quality indicators particularly being able to detect ABE-related distortions. Second, we investigate the combination of perceptually and nonperceptually motivated distortion-related statistics. Third, we propose a support-vector-machine-based high-performance MOS predictor for ABE speech quality assessment, finally, we present the training process based on the subjective listening test data. A k-fold cross-validation test on 1) disjoint languages, 2) disjoint speakers, and 3) disjoint ABE solutions proves the superiority of our proposed measure in the ITU-T-recommended categories accuracy, consistency, and linearity compared to both, wideband PESQ and POLQA.

[1]  Petr Pollák,et al.  Methods for Speech SNR Estimation: Evaluation Tool and Analysis of VAD Dependency , 2005 .

[2]  Patrick Bauer,et al.  A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription , 2009, 2009 17th European Signal Processing Conference.

[3]  Shenghui Zhao,et al.  Speech bandwidth expansion based on deep neural networks , 2015, INTERSPEECH.

[4]  Franz Pernkopf,et al.  Modeling speech with sum-product networks: Application to bandwidth extension , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Stefano Cosentino,et al.  Objective speech intelligibility measurement for cochlear implant users in complex listening environments , 2013, Speech Commun..

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  Peter Kabal,et al.  Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech , 2011, INTERSPEECH.

[8]  Tim Fingscheidt,et al.  Environment-Optimized Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Patrick Bauer,et al.  HMM-based artificial bandwidth extension supported by neural networks , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[10]  Roland Sottek Modelle zur Signalverarbeitung im menschlichen Gehör , 1993 .

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Methods , metrics and procedures for statistical evaluation , qualification and comparison of objective quality prediction models , 2013 .

[13]  Peter Jax,et al.  On artificial bandwidth extension of telephone speech , 2003, Signal Process..

[14]  Israel Cohen,et al.  Evaluation of a Speech Bandwidth Extension Algorithm Based on Vocal Tract Shape Estimation , 2012, IWAENC.

[15]  Paavo Alku,et al.  Speech quality prediction for artificial bandwidth extension algorithms , 2013, INTERSPEECH.

[16]  Patrick Bauer,et al.  Impact of hearing impairment on fricative intelligibility for artificially bandwidth-extended telephone speech in noise , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Peter Vary,et al.  Audiosignalverarbeitung für Videokonferenzsysteme , 2013, GI-Jahrestagung.

[18]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[19]  Paavo Alku,et al.  Speech quality evaluation of artificial bandwidth extension: comparing subjective judgments and instrumental predictions , 2015, INTERSPEECH.

[20]  Paavo Alku,et al.  Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Engin Erzin,et al.  Artificial bandwidth extension of spectral envelope along a Viterbi path , 2013, Speech Commun..

[22]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[23]  Klaus Genuit,et al.  Models of signal processing in human hearing , 2005 .

[24]  Hugo Fastl,et al.  Psychoacoustics Facts and Models. 2nd updated edition , 1999 .

[25]  Franz Pernkopf,et al.  On representation learning for artificial bandwidth extension , 2015, INTERSPEECH.

[26]  Israel Cohen,et al.  Speech bandwidth extension based on speech phonetic content and speaker vocal tract shape estimation , 2011, 2011 19th European Signal Processing Conference.

[27]  Engin Erzin,et al.  Synchronous overlap and add of spectra for enhancement of excitation in artificial bandwidth extension of speech , 2015, INTERSPEECH.

[28]  Chin-Hui Lee,et al.  A deep neural network approach to speech bandwidth expansion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Patrick Bauer,et al.  On speech quality assessment of artificial bandwidth extension , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Tim Fingscheidt,et al.  A Phonetic Reference Paradigm for Instrumental Speech Quality Assessment of Artificial Speech Bandwidth Extension , 2017 .

[31]  Paavo Alku,et al.  A subjective listening test of six different artificial bandwidth extension approaches in English, Chinese, German, and Korean , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[33]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Tim Fingscheidt,et al.  Artificial bandwidth extension using deep neural networks for spectral envelope estimation , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).