Analysis by synthesis of pathological voices using the Klatt synthesizer

Abstract The ability to synthesize pathological voices may provide a tool for the development of a standard protocol for assessment of vocal quality. An analysis-by-synthesis approach using the Klatt formant synthesizer was applied to study 24 tokens of the vowel /a/ spoken by males and females with moderate-to-severe voice disorders. Both temporal and spectral features of the natural waveforms were analyzed and the results were used to guide synthesis. Perceptual evaluation indicated that about half the synthetic voices matched the natural waveforms they modeled in quality. The stimuli that received poor ratings reflected failures to model very unsteady or “gargled” voices or failures in synthesizing perfect copies of the natural spectra. Several modifications to the Klatt synthesizer may improve synthesis of pathological voices. These modifications include providing jitter and shimmer parameters; updating synthesis parameters as a function of period, rather than absolute time; modeling diplophonia with independent parameters for fundamental frequency and amplitude variations; providing a parameter to increase low-frequency energy; and adding more pole-zero pairs.

[1]  Vicki L. Heiberger,et al.  Jitter and Shimmer in Sustained Phonation , 1982 .

[2]  Jensen Pj,et al.  Adequacy of terminology for clinical judgment of voice quality deviation. , 1965 .

[3]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.

[4]  Inger Karlsson Modelling voice variations in female speech synthesis , 1992, Speech Commun..

[5]  P. J. Price,et al.  Male and female voice source characteristics: Inverse filtering results , 1989, Speech Commun..

[6]  C. Gobl Voice source dynamics in connected speech , 1988 .

[7]  Norman J. Lass,et al.  Speech and Language: Advances in Basic Research and Practice , 1979 .

[8]  B. Gerratt,et al.  Source characteristics of diplophonia , 1988 .

[9]  D. W. Warren,et al.  Maintenance of intraoral pressure during speech after maxillary resection. , 1988, The Journal of the Acoustical Society of America.

[10]  Hiroya Fujisaki,et al.  Proposal and evaluation of models for the glottal source waveform , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  B. Hammarberg,et al.  Vocal Fold Physiology: Acoustic, Perceptual, and Physiological Aspects of Voice Mechanisms , 1991 .

[12]  Rolf Carlson,et al.  Experiments with voice modelling in speech synthesis , 1991, Speech Commun..

[13]  J. Kreiman,et al.  Perception of supraperiodic voices , 1993 .

[14]  D G Childers,et al.  Modeling the glottal volume-velocity waveform for three voice types. , 1995, The Journal of the Acoustical Society of America.

[15]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[16]  J Hillenbrand,et al.  A methodological study of perturbation and additive noise in synthetically generated voice signals. , 1987, Journal of speech and hearing research.

[17]  J. Hillenbrand,et al.  Multidimensional scaling analysis of dysphonia in two speaker groups. , 1991, Journal of speech and hearing research.

[18]  J. Laver The phonetic description of voice quality , 1980 .

[19]  J Kreiman,et al.  Comparing internal and external standards in voice quality judgments. , 1993, Journal of speech and hearing research.

[20]  H. von Leden,et al.  Dynamic variations of the vibratory pattern in the normal larynx. , 1958, Folia phoniatrica.

[21]  Christer Gobl,et al.  Acoustic characteristics of voice quality , 1992, Speech Commun..

[22]  W S Winholtz,et al.  Miniature head-mounted microphone for voice perturbation analysis. , 1997, Journal of speech, language, and hearing research : JSLHR.

[23]  P. Jensen,et al.  Adequacy of terminology for clinical judgment of voice quality deviation. , 1965, Eye, ear, nose & throat monthly.

[24]  J Kreiman,et al.  The perceptual structure of pathologic voice quality. , 1996, The Journal of the Acoustical Society of America.

[25]  J. Hillenbrand Perception of aperiodicities in synthetically generated voices. , 1988, The Journal of the Acoustical Society of America.

[26]  Inger Karlsson,et al.  Female voices in speech synthesis , 1991 .

[27]  N Bi,et al.  Enhancement of female esophageal and tracheoesophageal speech. , 1995, The Journal of the Acoustical Society of America.

[28]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[29]  J. Kreiman,et al.  The multidimensional nature of pathologic vocal quality. , 1994, The Journal of the Acoustical Society of America.

[30]  Donald G. Childers,et al.  Modeling vocal disorders via formant synthesis , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.