Perceptual evaluation of voice source models.

Models of the voice source differ in their fits to natural voices, but it is unclear which differences in fit are perceptually salient. This study examined the relationship between the fit of five voice source models to 40 natural voices, and the degree of perceptual match among stimuli synthesized with each of the modeled sources. Listeners completed a visual sort-and-rate task to compare versions of each voice created with the different source models, and the results were analyzed using multidimensional scaling. Neither fits to pulse shapes nor fits to landmark points on the pulses predicted observed differences in quality. Further, the source models fit the opening phase of the glottal pulses better than they fit the closing phase, but at the same time similarity in quality was better predicted by the timing and amplitude of the negative peak of the flow derivative (part of the closing phase) than by the timing and/or amplitude of peak glottal opening. Results indicate that simply knowing how (or how well) a particular source model fits or does not fit a target source pulse in the time domain provides little insight into what aspects of the voice source are important to listeners.

[1]  Ailbhe Ní Chasaide,et al.  Voice Source Variation and Its Communicative Functions , 2010 .

[2]  J. Westbury On the analysis of speech movements , 1991 .

[3]  Jody Kreiman,et al.  Acoustic and perceptual effects of changes in body layer stiffness in symmetric and asymmetric vocal fold models. , 2013, The Journal of the Acoustical Society of America.

[4]  Robin A. Samlan,et al.  Toward a unified theory of voice production and perception , 2014, Loquens.

[5]  Boris Doval,et al.  Spectral correlates of glottal waveform models: an analytic study , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[7]  Christina M. Esposito The effects of linguistic experience on the perception of phonation , 2010, J. Phonetics.

[8]  Abeer Alwan,et al.  A new voice source model based on high-speed imaging and its application to voice source estimation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  I Maddieson,et al.  Digital inverse filtering for linguistic research. , 1987, Journal of speech and hearing research.

[10]  G. Fant Dept. for Speech, Music and Hearing Quarterly Progress and Status Report the Lf-model Revisited. Transformations and Frequency Domain Analysis the Lf-model Revisited. Transformations and Frequency Domain Analysis* , 2022 .

[11]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[12]  John Kane,et al.  A spectral LF model based approach to voice source parameterisation , 2010, INTERSPEECH.

[13]  Jody Kreiman,et al.  Perception of aperiodicity in pathological voice. , 2005, The Journal of the Acoustical Society of America.

[14]  J. Kreiman,et al.  When and why listeners disagree in voice quality assessment tasks. , 2007, The Journal of the Acoustical Society of America.

[15]  Speech , Music and Hearing Quarterly Progress and Status Report Frequency domain interpretation and derivation of glottal flow parameters , 2007 .

[16]  Raymond N. J. Veldhuis,et al.  Perceptual aspects of glottal-pulse parameter variations , 2005, Speech Commun..

[17]  PAAVO ALKU,et al.  Glottal inverse filtering analysis of human voice production — A review of estimation and parameterization methods of the glottal excitation and their applications , 2011 .

[18]  Christophe d'Alessandro,et al.  Spectral methods for voice source parameters estimation , 1997, EUROSPEECH.

[19]  Svante Granqvist,et al.  The visual sort and rate method for perceptual evaluation in listening tests , 2003, Logopedics, phoniatrics, vocology.

[20]  Abeer Alwan,et al.  Estimating the voice source in noise , 2012, INTERSPEECH.

[21]  A. M. Mimpen,et al.  The ear as a frequency analyzer. II. , 1964, The Journal of the Acoustical Society of America.

[22]  Gunnar Fant,et al.  Some problems in voice source analysis , 1993, Speech Commun..

[23]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[24]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[25]  Ananthapadmanabha Quarterly Progress and Status Report Acoustic analysis of voice source dynamics , 2007 .

[26]  Abeer Alwan,et al.  A perceptually and physiologically motivated voice source model , 2013, INTERSPEECH.

[27]  Jody Kreiman,et al.  Integrated software for analysis and synthesis of voice quality , 2010, Behavior research methods.

[28]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[29]  Raymond N. J. Veldhuis,et al.  A method for analysing the perceptual relevance of glottal-pulse parameter variations , 2001, Speech Commun..

[30]  Abeer Alwan,et al.  Investigating the relationship between glottal area waveform shape and harmonic magnitudes through computational modeling and laryngeal high-speed videoendoscopy , 2013, INTERSPEECH.

[31]  Peter Ladefoged,et al.  Phonation types: a cross-linguistic overview , 2001, J. Phonetics.

[32]  Jody Kreiman,et al.  Measures of the glottal source spectrum. , 2007, Journal of speech, language, and hearing research : JSLHR.

[33]  Christophe d'Alessandro,et al.  Spectral correlates of voice open quotient and glottal flow asymmetry : theory, limits and experimental data , 2001, INTERSPEECH.

[34]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[35]  Abeer Alwan,et al.  Acoustic Correlates of Glottal Gaps , 2011, INTERSPEECH.

[36]  Nathalie Henrich Bernardoni,et al.  The spectrum of glottal flow models , 2006 .

[37]  Mark A. Clements,et al.  Glottal Models for Digital Speech Processing: A Historical Survey and New Results , 1995 .

[38]  Hiroya Fujisaki,et al.  Proposal and evaluation of models for the glottal source waveform , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.