Excitation-based Voice Quality Analysis and Modification

This paper investigates the differences occuring in the excitation for different voice qualities. Its goal is two-fold. First a large corpus containing three voice qualities (modal, soft and loud) uttered by the same speaker is analyzed and significant differences in characteristics extracted from the excitation are observed. Secondly rules of modification derived from the analysis are used to build a voice quality transformation system applied as a post-process to HMM-based speech synthesis. The system is shown to effectively achieve the transformations while maintaining the delivered quality.

[1]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[2]  H. Zen,et al.  An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[3]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[4]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[5]  Christophe d'Alessandro,et al.  A method for glottal formant frequency estimation , 2004, INTERSPEECH.

[6]  Paavo Alku,et al.  Comparison of multiple voice source parameters in different phonation types , 2007, INTERSPEECH.

[7]  Thierry Dutoit,et al.  Complex cepstrum-based decomposition of speech for glottal source estimation , 2009, INTERSPEECH.

[8]  Marc Schröder,et al.  Expressing vocal effort in concatenative synthesis , 2003 .

[9]  Christophe d'Alessandro,et al.  The voice source as a causal/anticausal linear filter , 2003 .

[10]  Marc Schröder,et al.  Voice quality interpolation for emotional text-to-speech synthesis , 2005, INTERSPEECH.

[11]  Thierry Dutoit,et al.  Glottal closure and opening instant detection from speech signals , 2019, INTERSPEECH.

[12]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[13]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[14]  Thierry Dutoit,et al.  A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis , 2019, INTERSPEECH.

[15]  Christophe d’Alessandro,et al.  Voice Source Parameters and Prosodic Analysis , 2006 .