A perceptually and physiologically motivated voice source model

Many glottal source models have been proposed, but none has been systematically validated perceptually. Our previous work showed that model fitting of the negative peak of the flow derivative is the most important predictor of perceptual similarity to the target voice. In this study, a new voice source model is proposed to capture perceptually-important source shape aspects. This new model, along with four other source models, was fitted to 40 voice sources (20 male) obtained by inverse filtering and analysis-by-synthesis (AbS) of samples of natural normal and pathologic phonation. We generated synthetic copies of the voices using each modeled source pulse, with all other synthesis parameters held constant, and then conducted a visual sort-and-rate task in which listeners assessed the extent of perceived similarity between the target voice samples and each copy. Results showed that the proposed model provided a more accurate fit and a better perceptual match to the target than did the other models.

[1]  J. Sundberg,et al.  Effects on the glottal voice source of vocal loudness variation in untrained female and male voices. , 2005, The Journal of the Acoustical Society of America.

[2]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[3]  Jody Kreiman,et al.  Integrated software for analysis and synthesis of voice quality , 2010, Behavior research methods.

[4]  Abeer Alwan,et al.  Acoustic Correlates of Glottal Gaps , 2011, INTERSPEECH.

[5]  Daniel P. W. Ellis,et al.  Data-driven voice source waveform analysis and synthesis , 2012, Speech Commun..

[6]  Abeer Alwan,et al.  On the interdependencies between voice quality, glottal gaps, and voice-source related acoustic measures , 2010, INTERSPEECH.

[7]  A. Alwan,et al.  Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonation. , 2012, The Journal of the Acoustical Society of America.

[8]  G. de Krom A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. , 1993, Journal of speech and hearing research.

[9]  Paavo Alku,et al.  An amplitude quotient based method to analyze changes in the shape of the glottal pulse in the regulation of vocal intensity. , 2006, The Journal of the Acoustical Society of America.

[10]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[11]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[12]  Mark A. Clements,et al.  Glottal Models for Digital Speech Processing: A Historical Survey and New Results , 1995 .

[13]  A. Alwan,et al.  Development of a glottal area index that integrates glottal gap size and open quotient. , 2013, The Journal of the Acoustical Society of America.

[14]  Axel Röbel,et al.  Analysis and modification of excitation source characteristics for singing voice synthesis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Svante Granqvist,et al.  The visual sort and rate method for perceptual evaluation in listening tests , 2003, Logopedics, phoniatrics, vocology.

[16]  Patrick A. Naylor,et al.  Data-driven voice soruce waveform modelling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Paavo Alku,et al.  Comparison of multiple voice source parameters in different phonation types , 2007, INTERSPEECH.

[18]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[19]  Christina M. Esposito The effects of linguistic experience on the perception of phonation , 2010, J. Phonetics.

[20]  Abeer Alwan,et al.  Estimating the voice source in noise , 2012, INTERSPEECH.

[21]  Gunnar Fant,et al.  Some problems in voice source analysis , 1993, Speech Commun..

[22]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[23]  Hiroya Fujisaki,et al.  Proposal and evaluation of models for the glottal source waveform , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  B. Yegnanarayana,et al.  Perceived loudness of speech based on the characteristics of glottal excitation source. , 2009, The Journal of the Acoustical Society of America.

[25]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[26]  Matti Airas,et al.  TKK Aparat: An environment for voice inverse filtering and parameterization , 2008, Logopedics, phoniatrics, vocology.

[27]  Paavo Alku,et al.  Time-domain parameterization of the closing phase of glottal airflow waveform from voices over a large intensity range , 2002, IEEE Trans. Speech Audio Process..

[28]  Janet Slifka,et al.  Towards models of phonation , 2001, J. Phonetics.

[29]  Thierry Dutoit,et al.  Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Abeer Alwan,et al.  A new voice source model based on high-speed imaging and its application to voice source estimation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  I Maddieson,et al.  Digital inverse filtering for linguistic research. , 1987, Journal of speech and hearing research.

[32]  John Kane,et al.  Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Hiroshi Ishiguro,et al.  Improved Acoustic Characterization of Breathy and Whispery Voices , 2011, INTERSPEECH.

[34]  R Veldhuis,et al.  A computationally efficient alternative for the Liljencrants-Fant model and its perceptual evaluation. , 1998, The Journal of the Acoustical Society of America.