Perceptual non-intrusive speech quality assessment using a self-organizing map

Purpose – This paper seeks to propose a new non‐intrusive method for the assessment of speech quality of voice communication systems and evaluate its performance.Design/methodology/approach – The method is based on measuring perception‐based objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre‐formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into equivalent subjective mean opinion scores (MOSs). The required clustering and matching processes are achieved by an efficient data‐mining tool known as the self‐organizing map (SOM). The proposed method was examined using a wide range of distortion including speech compression, wireless channel impairments, VoIP channel impairments, and modifications to the signal from features such as AGC.Findings – The experimental results reported ...

[1]  Kaliappan Gopalan,et al.  A comparison of speaker identification results using features based on cepstrum and Fourier-Bessel expansion , 1999, IEEE Trans. Speech Audio Process..

[2]  John G. Beerends,et al.  A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation , 1992 .

[3]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[4]  Wonho Yang,et al.  Performance of current perceptual objective speech quality measures , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[5]  Antony W. Rix,et al.  Perceptual speech quality assessment - a review , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  L. Siegel,et al.  Voiced/Unvoiced/Mixed excitation classification of speech , 1982 .

[7]  Gernot Kubin,et al.  Performance of noise excitation for unvoiced speech , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[8]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[9]  Matti Karjalainen,et al.  A new auditory model for the evaluation of sound quality of audio systems , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  A.W. Rix,et al.  The perceptual analysis measurement system for robust end-to-end speech quality assessment , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[12]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[13]  Abdulhussain E. Mahdi,et al.  Output-based objective speech quality measure using self-organizing map , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Adrian E. Conway,et al.  Output-based method of applying PESQ to measure the perceptual quality of framed speech signals , 2004, 2004 IEEE Wireless Communications and Networking Conference (IEEE Cat. No.04TH8733).

[15]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[16]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[17]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[18]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..