A preliminary study of voice quality transformation based on modifications to the neutral vocal tract area function

Abstract The idea is pursued that voice quality can be partially represented by the underlying shape of a speaker's neutral vocal tract. Using an area function model, which allows direct access to the neutral tract shape, four separate modifications were made to one male speaker's vocal tract. The modifications involve imposing constrictive or expansive effects on the pharyngeal and oral portions of the neutral area function as well as on lip aperture and the epi-laryngeal tube. A single word utterance was first synthesized by superimposing deformation patterns appropriate for the word onto the original neutral tract shape (area function). Then, four additional samples of the word were synthesized using different modified neutral area function each time. The modifications were assessed by comparing F 1–F 2 formant trajectories of the original utterance with those of the modifications. The formant frequencies were observed to shift within the F 1–F 2 plane in directions predictable from simple tube acoustics. However, the modified voice qualities did not preserve the shape of the original F 1–F 2 trajectory. In other words, the modifications did not create a simple linear transformation of formant frequencies even though the “articulatory dynamics” (deformation patterns of the area function) were identical in all cases. These somewhat artificial vocal tract modifications were also compared with formant frequencies extracted from recordings of a speaker attempting to produce the same types of modifications. In general, the speaker's formant trajectories showed some similarities to the synthesized versions. However, the speaker also seemed to grade the “level” of the voice quality that was exerted on the utterance depending on whether the demands of the voice quality were in competition with the linguistic demands of a given phonetic segment. Finally, to demonstrate this type of voice quality modification in a broader context, the same procedures were applied to sentence-level speech and results were again shown as F 1–F 2 formant trajectories.

[1]  H. Traunmüller Conventional, Biological and Environmental Factors in Speech Communication: A Modulation Theory , 1994, Phonetica.

[2]  Brad H. Story,et al.  Parameterization of vocal tract area functions by empirical orthogonal modes , 1998 .

[3]  M. Schroeder Determination of the geometry of the human vocal tract by acoustic measurements. , 1967, The Journal of the Acoustical Society of America.

[4]  P. Mermelstein Determination of the vocal-tract shape from measured formant frequencies. , 1967, The Journal of the Acoustical Society of America.

[5]  H. Traunmüller,et al.  Acoustic effects of variation in vocal effort by men, women, and children. , 2000, The Journal of the Acoustical Society of America.

[6]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[7]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[8]  Jonathan Harrington,et al.  The Acoustic Theory of Speech Production , 1999 .

[9]  Björn Lindblom,et al.  Speech transforms , 1992, Speech Commun..

[10]  J. Laver The phonetic description of voice quality , 1980 .

[11]  J. Perkell,et al.  Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice. , 1988, The Journal of the Acoustical Society of America.

[12]  Man Mohan Sondhi,et al.  A hybrid time-frequency domain articulatory speech synthesizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[13]  Ingo R. Titze,et al.  Principles of voice production , 1994 .

[14]  I. Titze Physiologic and acoustic differences between male and female voices. , 1989, The Journal of the Acoustical Society of America.

[15]  Elaine Drom,et al.  Information conveyed by vowels about other vowels , 2004 .

[16]  Brad H. Story,et al.  Acoustics of the tenor high voice. , 1992 .

[17]  Kenneth N. Stevens,et al.  On the quantal nature of speech , 1972 .

[18]  Ursula Gisela Goldstein,et al.  An articulatory model for the vocal tracts of growing children , 1980 .

[19]  Brad H. Story,et al.  Simulation of sentence‐level speech based on measured vocal tract area functions , 1998 .

[20]  Gunnar Fant,et al.  Some problems in voice source analysis , 1993, Speech Commun..

[21]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .