Disordered Speech Quality estimation using the Matching Pursuit algorithm

This paper proposes a novel non-intrusive auditory perception-based approach for disordered speech quality estimation. An adaptive time-frequency algorithm, viz. the Matching Pursuit (MP) algorithm, is used to generate a reference signal from the disordered speech signal. Both the generated reference signal and the original degraded signal are given to the International Telecommunication Union (ITU)-standardized Perceptual Evaluation of Speech Quality (PESQ) estimator to obtain a quality score. Our approach is tested on two different databases consisting of tracheoesophageal speech samples. Results show that our method performs significantly better than the conventional acoustical measures of disordered speech quality.

[1]  D. Jamieson,et al.  Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. , 2001, Journal of speech, language, and hearing research : JSLHR.

[2]  Christopher R Watts,et al.  An examination of variations in the cepstral spectral index of dysphonia across a single breath group in connected speech. , 2015, Journal of voice : official journal of the Voice Foundation.

[3]  V. Parsa,et al.  Prediction of the quality ratings of tracheoespohageal speech using adaptive time-frequency representations , 2008, 2008 Canadian Conference on Electrical and Computer Engineering.

[4]  F. Klingholtz Acoustic recognition of voice disorders: a comparative study of running speech versus sustained vowels. , 1990, The Journal of the Acoustical Society of America.

[5]  Jean Schoentgen,et al.  Estimation of vocal noise in running speech by means of bi-directional double linear prediction , 2003, INTERSPEECH.

[6]  R. Hillman,et al.  Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol. , 2009, American journal of speech-language pathology.

[7]  D. Gabor Acoustical Quanta and the Theory of Hearing , 1947, Nature.

[8]  Sridhar Krishnan,et al.  Time-frequency modeling and classification of pathological voices , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[9]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  H. Wertzner,et al.  Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders , 2005, Brazilian journal of otorhinolaryngology.

[11]  Y. Qi,et al.  The estimation of signal-to-noise ratio in continuous speech for disordered voices. , 1999, The Journal of the Acoustical Society of America.

[12]  Geoffrey S. Meltzner,et al.  Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: Comparisons with auditory-perceptual judgements from the CAPE-V , 2010, Clinical linguistics & phonetics.

[13]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[15]  Soren Y. Lowell,et al.  Spectral- and Cepstral-Based Acoustic Features of Dysphonic, Strained Voice Quality , 2012, The Annals of otology, rhinology, and laryngology.

[16]  J. Hillenbrand,et al.  Cepstral Peak Prominence: A More Reliable Measure of Dysphonia , 2003, The Annals of otology, rhinology, and laryngology.

[17]  Mireia Farrús,et al.  Jitter and shimmer measurements for speaker recognition , 2007, INTERSPEECH.

[18]  Soren Y Lowell,et al.  Spectral- and cepstral-based measures during continuous speech: capacity to distinguish dysphonia and consistency within a speaker. , 2010, Journal of voice : official journal of the Voice Foundation.

[19]  C. Frattali Measuring outcomes in speech-language pathology , 1997 .

[20]  D. Childers,et al.  Acoustic correlates of vocal quality. , 1990, Journal of speech and hearing research.

[21]  Karthikeyan Umapathy,et al.  Discrimination of pathological voices using a time-frequency approach , 2005, IEEE Transactions on Biomedical Engineering.