ANIQUE: An Auditory Model for Single-Ended Speech Quality Estimation

In predicting subjective quality of speech signal degraded by telecommunication networks, conventional objective models require a reference source speech signal, which is applied as an input to the network, as well as the degraded speech. Non-intrusive estimation of speech quality is a challenging problem in that only the degraded speech signal is available. Non-intrusive estimation can be used in many real applications when source speech signal is not available. In this paper, we propose a new approach for non-intrusive speech quality estimation utilizing the temporal envelope representation of speech. The proposed auditory non-intrusive quality estimation (ANIQUE) model is based on the functional roles of human auditory systems and the characteristics of human articulation systems. Experimental evaluations on 35 different tests demonstrated the effectiveness of the proposed model.

[1]  N. Viemeister Temporal modulation transfer functions based upon modulation thresholds. , 1979, The Journal of the Acoustical Society of America.

[2]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[3]  John G. Beerends,et al.  A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation , 1992 .

[4]  Robert F. Kubichek,et al.  Output-based objective speech quality , 1994, Proceedings of IEEE Vehicular Technology Conference (VTC).

[5]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[6]  R. Kubichek,et al.  Output-based objective speech quality using vector quantization techniques , 1995, Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers.

[7]  Robert F. Kubichek,et al.  Vector quantization techniques for output-based objective speech quality , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration. , 1997, The Journal of the Acoustical Society of America.

[9]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[10]  Stephen D. Voran,et al.  Objective estimation of perceived speech quality. I. Development of the measuring normalizing block technique , 1999, IEEE Trans. Speech Audio Process..

[11]  Stephen D. Voran,et al.  Objective estimation of perceived speech quality .II. Evaluation of the measuring normalizing block technique , 1999, IEEE Trans. Speech Audio Process..

[12]  Mike P. Hollier,et al.  Non-intrusive speech-quality assessment using vocal-tract models , 2000 .

[13]  Richard S. J. Frackowiak,et al.  Representation of the temporal envelope of sounds in the human brain. , 2000, Journal of neurophysiology.

[14]  O Ghitza,et al.  On the upper cutoff frequency of the auditory critical-band envelope detectors in the context of speech perception. , 2001, The Journal of the Acoustical Society of America.

[15]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[16]  Alexander Raake Does the Content of Speech Influence its Perceived Sound Quality? , 2002, LREC.

[17]  Methods for objective and subjective assessment of quality Subjective quality evaluation of telephone services based on spoken dialogue systems , 2004 .

[18]  Doh-Suk Kim,et al.  Perceptual model for non-intrusive speech quality assessment , 2006, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Doh-Suk Kim A cue for objective speech quality estimation in temporal envelope representations , 2004, IEEE Signal Processing Letters.