Automatic voice-source parameterization of natural speech

We present here our work in automatic parameterization of natural speech by means of a pitch synchronous source-filter decomposition algorithm. The derivative glottal source is modelled using the Liljencrants-Fant (LF) model. The model parameters are obtained simultaneously with the coefficients of an all-pole filter representing the vocal tract response by means of a quadratic programming algorithm. Synthetic data has been created and analyzed in order to show the appropriate function of the estimation method. The parameterization results in high quality synthesized speech for voiced frames. Voice quality extraction is performed on basis to the LF source representation. The inherent modelling of the voice source makes it suitable for voice modification tasks. Work is in progress to add this speech representation to emotional speech synthesis and voice conversion algorithms.

[1]  Redwan Salami,et al.  GSM enhanced full rate speech codec , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  B. Doval,et al.  On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation. , 2004, The Journal of the Acoustical Society of America.

[3]  Julius O. Smith,et al.  Toward a high-quality singing synthesizer with vocal texture control , 2002 .

[4]  Qiguang Lin Speech production theory and articulatory speech synthesis , 1991 .

[5]  G. Fant Dept. for Speech, Music and Hearing Quarterly Progress and Status Report the Lf-model Revisited. Transformations and Frequency Domain Analysis the Lf-model Revisited. Transformations and Frequency Domain Analysis* , 2022 .

[6]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[7]  Horacio Rodríguez Hontoria,et al.  Proyecto ALIADO : tecnologías del habla y el lenguaje para un asistente personal , 2003 .

[8]  Lou Boves,et al.  Fitting a LF-model to inverse filter signals , 1993, EUROSPEECH.

[9]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[10]  Youngmoo E. Kim Singing voice analysis/synthesis , 2003 .

[11]  H. Strube,et al.  SIM--simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals. , 2001, The Journal of the Acoustical Society of America.

[12]  C. Gobl The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis , 2003 .

[13]  T Schaaf,et al.  Technology and Corpora for Speech to Speech Translation Title: Asr Progress Report , 2005 .