Measuring vocal quality with speech synthesis

Much previous research has demonstrated that listeners do not agree well when using traditional rating scales to measure pathological voice quality. Although these findings may indicate that listeners are inherently unable to agree in their perception of such complex auditory stimuli, another explanation implicates the particular measurement method-rating scale judgments-as the culprit. An alternative method of assessing quality-listener-mediated analysis-synthesis-was devised to assess this possibility. In this new approach, listeners explicitly compare synthetic and natural voice samples, and adjust speech synthesizer parameters to create auditory matches to voice stimuli. This method is designed to replace unstable internal standards for qualities like breathiness and roughness with externally presented stimuli, to overcome major hypothetical sources of disagreement in rating scale judgments. In a preliminary test of the reliability of this method, listeners were asked to adjust the signal-to-noise ratio for 12 synthetic pathological voices so that the resulting stimuli matched the natural target voices as well as possible For comparison to the synthesis judgments, listeners also judged the noisiness of the natural stimuli in a separate task using a traditional visual-analog rating scale. For 9 of the 12 voices, agreement among listeners was significantly (and substantially) greater for the synthesis task than for the rating scale task. Response variances for the two tasks did not differ for the remaining three voices. However, a second experiment showed that the synthesis settings that listeners selected for these three voices were within a difference limen, and therefore observed differences were perceptually insignificant. These results indicate that listeners can in fact agree in their perceptual assessments of voice quality, and that analysis-synthesis can measure perception reliably.

[1]  J. Sundberg,et al.  The Science of Singing Voice , 1987 .

[2]  SOURCE MODEL ADEQUACY FOR PATHOLOGICAL VOICE SYNTHESIS , 1999 .

[3]  F L Wuyts,et al.  Is the reliability of a visual analog scale higher than an ordinal scale? An experiment with the GRBAS scale for the perceptual evaluation of dysphonia. , 1999, Journal of voice : official journal of the Voice Foundation.

[4]  M. Hirano,et al.  Acoustic analysis of pathological voice. Some results of clinical application. , 1988, Acta oto-laryngologica.

[5]  J Kreiman,et al.  Validity of rating scale measures of voice quality. , 1998, The Journal of the Acoustical Society of America.

[6]  Bert Cranen,et al.  Modeling a leaky glottis. , 1992 .

[7]  P. Jensen,et al.  Adequacy of terminology for clinical judgment of voice quality deviation. , 1965, Eye, ear, nose & throat monthly.

[8]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality. , 1994, Journal of speech and hearing research.

[9]  G. Gescheider Psychophysics: The Fundamentals , 1997 .

[10]  G. de Krom A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. , 1993, Journal of speech and hearing research.

[11]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. , 1996, Journal of speech and hearing research.

[12]  Douglas H. Wedell,et al.  Reducing the dependence of clinical judgment on the immediate context: effects of number of categories and type of anchors. , 1990 .

[13]  J. Kreiman,et al.  The multidimensional nature of pathologic vocal quality. , 1994, The Journal of the Acoustical Society of America.

[14]  M. P. Gelfer Perceptual attributes of voice: Development and use of rating scales , 1988 .

[15]  J Kreiman,et al.  Comparing internal and external standards in voice quality judgments. , 1993, Journal of speech and hearing research.

[16]  G. Gescheider,et al.  Stimulus context and absolute magnitude estimation: A study of individual differences , 1991, Perception & psychophysics.

[17]  D Michaelis,et al.  Selection and combination of acoustic features for the description of pathologic voices. , 1998, The Journal of the Acoustical Society of America.

[18]  G. de Krom,et al.  Consistency and reliability of voice quality ratings for different types of speech fragments. , 1994, Journal of speech and hearing research.

[19]  E. Poulton Models for biases in judging sensory magnitude. , 1979, Psychological bulletin.

[20]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[21]  J. Kreiman,et al.  Perceptual evaluation of voice quality: review, tutorial, and a framework for future research. , 1993, Journal of speech and hearing research.

[22]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[23]  Guus de Krom,et al.  An experiment involving the consistency and reliability of voice quality ratings for different types of speech fragments , 1993, EUROSPEECH.

[24]  J. Kreiman,et al.  Sources of listener disagreement in voice quality assessment. , 2000, The Journal of the Acoustical Society of America.

[25]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[26]  Jensen Pj,et al.  Adequacy of terminology for clinical judgment of voice quality deviation. , 1965 .

[27]  I Maddieson,et al.  Digital inverse filtering for linguistic research. , 1987, Journal of speech and hearing research.

[28]  A Parducci,et al.  Reducing the dependence of clinical judgment on the immediate context: effects of number of categories and type of anchors. , 1990, Journal of personality and social psychology.

[29]  J. Kreiman,et al.  Listener experience and perception of voice quality. , 1988, Journal of speech and hearing research.

[30]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.