A Phonetic Reference Paradigm for Instrumental Speech Quality Assessment of Artificial Speech Bandwidth Extension

Today’s instrumental speech quality measures are limited in their use as they ”do not yet sufficiently include processing steps beyond the periphery of the auditory system” [1]. This becomes particularly obvious when using reference-based instrumental methods to assess the quality of artificial speech bandwidth extension (ABWE) approaches. While Blauert and Jekosch [1] have not proposed particular schemes, they advocate a model of sound quality representing layers of abstraction. In fact, once subjects are asked for opinion scores following any of ITU-T’s definitions, they have already understood (or not)what was spoken. It is our firm conviction that in nottoo-bad testing conditions this knowledge serves as internal reference for judging speech quality – which in consequence asks for a paradigm shift of reference-based instrumental speech quality measures. In consequence, not only some (direct wideband) reference speech data is useful, but also a phonetic transcription of the speech, serving as human-internal representation of what was spoken. The paper will give thoughts to support this thesis, along with a proof that not all sounds are equal, asking for a phoneme-specific processing of future reference-based instrumental speech quality assessment methods.

[1]  Patrick Bauer,et al.  Impact of hearing impairment on fricative intelligibility for artificially bandwidth-extended telephone speech in noise , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Paavo Alku,et al.  Evaluation of an Artificial Speech Bandwidth Extension Method in Three Languages , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Patrick Bauer,et al.  A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription , 2009, 2009 17th European Signal Processing Conference.

[4]  Paavo Alku,et al.  Speech quality prediction for artificial bandwidth extension algorithms , 2013, INTERSPEECH.

[5]  Paavo Alku,et al.  Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Paavo Alku,et al.  Neural Network-Based Artificial Bandwidth Expansion of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Patrick Bauer,et al.  An HMM-based artificial bandwidth extension evaluated by cross-language training and test , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Jens Blauert,et al.  A Layer Model of Sound Quality , 2012 .

[9]  Engin Erzin,et al.  Artificial bandwidth extension of spectral envelope along a Viterbi path , 2013, Speech Commun..

[10]  Gautham J. Mysore,et al.  Language informed bandwidth expansion , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[11]  Peter Jax,et al.  Wideband extension of telephone speech using a hidden Markov model , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[12]  Sebastian Möller,et al.  Speech Quality Estimation: Models and Trends , 2011, IEEE Signal Processing Magazine.