Speech modeling and processing by low-dimensional dynamic glottal models

We discuss the use of low-dimensional physical models of the voice source for speech coding and processing applications. A class of waveform-adaptive dynamic glottal models and parameter tracking procedures are illustrated. The model and analysis procedures are assessed by addressing signal transformations on recorded speech, achievable by fitting the model to the data, and then acting on the physically-oriented parameters of the voice source. The class of models proposed provides in principle a tool for both the estimation of glottal source signals, and the encoding of the speech signal for transformation purposes. The application of this model to time stretching and to frequency control (pitch shifting) is also illustrated. The experiments show that copy synthesis is perceptually almost indistinguishable form the target, and that time stretching and ”pitch extrapolation” effects can be obtained by simple control strategies.

[1]  Julius O. Smith,et al.  Generative Model of Voice in Noise for Structured Coding Applications , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Carlo Drioli A flow waveform-matched low-dimensional glottal model based on physical knowledge. , 2005, The Journal of the Acoustical Society of America.

[4]  Juergen Schroeter,et al.  Speech coding based on physiological models of speech production , 1992 .

[5]  Carlo Drioli,et al.  Non-modal voice synthesis by low-dimensional physical models , 2003, MAVEBA.

[6]  Qiang Fu,et al.  Robust Glottal Source Estimation Based on Joint Source-Filter Model Optimization , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Carlo Drioli Synthesis of voiced sounds by means of waveform adaptive physical models , 2003 .

[8]  Olivier Rosec,et al.  ARX-LF-based source-filter methods for voice modification and transformation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.