An oscillator-plus-noise model for speech synthesis

Abstract The autonomous oscillator model for speech synthesis is augmented by a non-linear predictor to re-generate the modulated noise-like signal component of speech signals. The resulting ‘oscillator-plus-noise’ model in combination with vocal tract modeling by linear prediction is able to re-generate the spectral content of stationary wide-band vowel signals with high fidelity. For adequate modeling of mixed-excitation speech signals (such as voiced fricatives), the model is extended by a second linear prediction path for the independent spectral shaping of the noise-like component. With one and the same model, not only sustained voiced and mixed-excitation phonemes, but also stationary unvoiced sounds can be re-generated faithfully.

[1]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[2]  Floris Takens,et al.  On the numerical determination of the dimension of an attractor , 1985 .

[3]  J. Makhoul,et al.  A mixed‐source model for speech compression and synthesis , 1978 .

[4]  Luc Gagnon,et al.  Nonlinear processing of phase vocoded speech , 1990 .

[5]  Jose C. Principe,et al.  Nonlinear Dynamic Modeling with Neural Networks , 1998 .

[6]  L. Tsimring,et al.  The analysis of observed chaotic data in physical systems , 1993 .

[7]  José Carlos Príncipe,et al.  Nonlinear dynamic modeling of the voiced excitation for improved speech synthesis , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Iain Mann,et al.  An investigation of nonlinear speech synthesis and pitch modification techniques , 2000 .

[9]  Gérard Bailly A Parametric Harmonic + Noise Model , 2002 .

[10]  Hui-Ling Lu,et al.  Glottal source modeling for singing voice synthesis , 2000, ICMC.

[11]  Dik J. Hermes,et al.  Synthesis of breathy vowels: Some research methods , 1991, Speech Commun..

[12]  Jean Schoentgen,et al.  An algorithm for the measurement of jitter , 1991, Speech Commun..

[13]  C H Shadle,et al.  Frication noise modulated by voicing, as revealed by pitch-scaled decomposition. , 2000, The Journal of the Acoustical Society of America.

[14]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[15]  Steve McLaughlin,et al.  Stable speech synthesis using recurrent radial basis functions , 1999, EUROSPEECH.

[16]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[17]  Donald G. Childers,et al.  Formant speech synthesis: improving production quality , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  Gernot Kubin,et al.  Nonlinear Synthesis of Vowels in the LP Residual Domain with a Regularized RBF Network , 2001, IWANN.

[19]  Holger Kantz,et al.  Practical implementation of nonlinear time series methods: The TISEAN package. , 1998, Chaos.

[20]  D G Childers,et al.  Modeling the glottal volume-velocity waveform for three voice types. , 1995, The Journal of the Acoustical Society of America.

[21]  Bo Zhang,et al.  Nonlinear speech model based on support vector machine and wavelet transform , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[22]  Simon Haykin,et al.  Neural networks expand SP's horizons , 1996, IEEE Signal Process. Mag..

[23]  Jean Schoentgen,et al.  Predictable and random components of jitter , 1997, Speech Commun..

[24]  Gernot Kubin,et al.  Performance of noise excitation for unvoiced speech , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[25]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[26]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[27]  Erhard Rank,et al.  Application of Bayesian trained RBF networks to nonlinear time-series modeling , 2003, Signal Process..

[28]  Eric Moulines,et al.  HNS: Speech modification based on a harmonic+noise model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Gernot Kubin,et al.  Detection of chaotic behaviour in speech signals using Fraser's mutual information algorithm , 1991 .

[30]  M. Birgmeier,et al.  A fully Kalman-trained radial basis function network for nonlinear speech modeling , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[31]  Jan Skoglund,et al.  On the significance of temporal masking in speech coding , 1998, ICSLP.

[32]  Francis C. Moon,et al.  Chaotic and fractal dynamics , 1992 .

[33]  Eric Moulines,et al.  High-quality speech modification based on a harmonic + noise model , 1995, EUROSPEECH.

[34]  Gernot Kubin,et al.  Synthesis and coding of continuous speech with the nonlinear oscillator model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[35]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[36]  Gernot Kubin,et al.  Identification of Nonlinear Oscillator Models for Speech Analysis and Synthesis , 2004, Summer School on Neural Networks.

[37]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  Steve McLaughlin,et al.  Synthesising natural-sounding vowels using a nonlinear dynamical model , 2001, Signal Process..

[40]  E. Keller Improvements in speech synthesis : COST 258, the naturalness of synthetic speech , 2002 .

[41]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[42]  Gérard Bailly,et al.  The Cost258 Signal Generation Test Array , 2000, LREC.

[43]  P.J.B. Jackson,et al.  Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data , 2000 .

[44]  Abeer Alwan,et al.  Noise source models for fricative consonants , 2000, IEEE Trans. Speech Audio Process..