A Methodology for Improving PESQ accuracy for Chinese Speech

Unlike English and most other European languages, Mandarin Chinese has two unique characteristics, the consonant-vowel-consonant (CVC) phonetic structure and use of tones, which may affect its intelligibility after processing by sound processing systems. Due to this, the perceptual evaluation of speech quality (PESQ) objective speech quality measurement system, which has been proven effective in measuring the speech quality of sound processing systems processing English or some other languages, may not accurately measure speech quality of systems processing Chinese speech. An evaluation was thus performed with PESQ to investigate whether intelligibility related problems that arise from the two characteristics are being considered in the computation of speech quality. Our evaluation reveals that PESQ indeed does not consider them through low correlation between subjective intelligibility and PESQ scores. A method known as consonant amplification was proposed to improve correlation results for Chinese speech, and this method is evaluated with PESQ.

[1]  A. M. Mimpen,et al.  Improving the reliability of testing the speech reception threshold for sentences. , 1979, Audiology : official organ of the International Society of Audiology.

[2]  Margaret Mortz,et al.  Time-expanded speech and speech recognition in older adults. , 2002, Journal of rehabilitation research and development.

[3]  S. Gordon-Salant Recognition of natural and time/intensity altered CVs by young and elderly subjects with normal hearing. , 1986, The Journal of the Acoustical Society of America.

[4]  Wonho Yang,et al.  Performance of current perceptual objective speech quality measures , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[5]  I. McLoughlin,et al.  Mandarin speech coding using a modified RPE-LTP technique , 2000, IEEE APCCAS 2000. 2000 IEEE Asia-Pacific Conference on Circuits and Systems. Electronic Communication Systems. (Cat. No.00EX394).

[6]  Ian McLoughlin,et al.  Proposal of standards for intelligibility tests of Chinese speech , 2000 .

[7]  W. D. Voiers,et al.  Diagnostic Evaluation of Speech Intelligibility , 1977 .

[8]  William D. Voiers,et al.  Interdependencies among measures of speech intelligility and speech "Quality" , 1980, ICASSP.

[9]  Russell J. Niederjohn,et al.  Enhancement of Speech Intelligibility at High Noise Levels by Filtering and Clipping , 1968 .

[10]  J. C. R. Licklider,et al.  Effects of Amplitude Distortion upon the Intelligibility of Speech , 1946 .

[11]  Jeff Rodman,et al.  The Effect of Bandwidth on Speech Intelligibility , 2003 .

[12]  Jialu Zhang Phonetic and linguistic features of spoken Chinese , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[13]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[14]  F Chong,et al.  Evaluation of ITU-T G. 728 as a voice over IP codec for Chinese speech , 2003 .

[15]  Ian McLoughlin,et al.  Extension of proposal of standards for intelligibility tests of Chinese speech: CDRT-tone , 2003 .

[16]  D J Van Tasell,et al.  Quantifying the relation between speech quality and speech intelligibility. , 1995, Journal of speech and hearing research.

[17]  W. Voiers,et al.  Diagnostic acceptability measure for speech communication systems , 1977 .