Oscillator-plus-Noise Modeling of Speech Signals

[1]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[2]  J. O. Smith,et al.  Estimating glottal aspiration noise via wavelet thresholding and best-basis thresholding , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[3]  Gernot Kubin,et al.  Nonlinear Synthesis of Vowels in the LP Residual Domain with a Regularized RBF Network , 2001, IWANN.

[4]  Sharad Singhal,et al.  Intelligibility as a function of speech coding method for template-based speech synthesis , 1993, EUROSPEECH.

[5]  Holger Kantz,et al.  Practical implementation of nonlinear time series methods: The TISEAN package. , 1998, Chaos.

[6]  James J. Carroll,et al.  Approximation of nonlinear systems with radial basis function neural networks , 2001, IEEE Trans. Neural Networks.

[7]  Christopher J. Zarowski Limitations on SNR estimator accuracy , 2002, IEEE Trans. Signal Process..

[8]  Nick Campbell Prosody and the selection of units for concatenation synthesis , 1994, SSW.

[9]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[10]  T. Sauer A noise reduction method for signals from nonlinear systems , 1992 .

[11]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[12]  Raymond N. J. Veldhuis,et al.  The effect of speech melody on voice quality , 2001, Speech Commun..

[13]  Josef Heiler Optimized frame selection for variable frame rate synthesis , 1982, ICASSP.

[14]  Gernot Kubin,et al.  An oscillator-plus-noise model for speech synthesis , 2006, Speech Commun..

[15]  Jean Schoentgen,et al.  Predictable and random components of jitter , 1997, Speech Commun..

[16]  Erhard Rank,et al.  Application of Bayesian trained RBF networks to nonlinear time-series modeling , 2003, Signal Process..

[17]  C. Adnene,et al.  Analysis of pathological voices by speech processing , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[18]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[19]  Floris Takens,et al.  On the numerical determination of the dimension of an attractor , 1985 .

[20]  Jack J. Jiang,et al.  Chaotic vibration induced by turbulent noise in a two-mass model of vocal folds. , 2002, The Journal of the Acoustical Society of America.

[21]  Joseph Olive,et al.  A scheme for concatenating units for speech synthesis , 1980, ICASSP.

[22]  Raymond N. J. Veldhuis,et al.  Reducing audible spectral discontinuities , 2001, IEEE Trans. Speech Audio Process..

[23]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[24]  Kevin Judd,et al.  Embedding as a modeling problem , 1998 .

[25]  Gérard Bailly,et al.  A three-dimensional linear articulatory model based on MRI data , 1998, ICSLP.

[26]  J. Pereira AC analysis of the three-mass model of the larynx , 1988, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[27]  J. Friedman Multivariate adaptive regression splines , 1990 .

[28]  Y. Stylianou,et al.  Decomposition of speech signals into a deterministic and a stochastic part , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[30]  Jorge C. Lucero,et al.  Dynamics of the two‐mass model of the vocal folds: Equilibria, bifurcations, and oscillation region , 1993 .

[31]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[32]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[33]  Donald G. Childers,et al.  Glottal source modeling for voice conversion , 1995, Speech Commun..

[34]  José Carlos Príncipe,et al.  The gamma model--A new neural model for temporal processing , 1992, Neural Networks.

[35]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[36]  L. Tsimring,et al.  The analysis of observed chaotic data in physical systems , 1993 .

[37]  D. Mitchell Wilkes,et al.  Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk , 2004, IEEE Transactions on Biomedical Engineering.

[38]  Perry R. Cook,et al.  SPASM, a Real-Time Vocal Tract Physical Model Controller; and Singer, the Companion Software Synthesis System , 1993 .

[39]  Michael W. Macon,et al.  Control of spectral dynamics in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[40]  Francis C. Moon,et al.  Chaotic and fractal dynamics , 1992 .

[41]  Nick Campbell,et al.  Objective distance measures for assessing concatenative speech synthesis , 1999, EUROSPEECH.

[42]  Marcos Faúndez-Zanuy,et al.  A Comparative Study Between Linear and Nonlinear Speech Prediction , 1997, IWANN.

[43]  I. Titze The physics of small-amplitude oscillation of the vocal folds. , 1988, The Journal of the Acoustical Society of America.

[44]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[45]  Nick Campbell,et al.  Optimising unit selection with voice source and formants in the CHATR speech synthesis system , 1997, EUROSPEECH.

[46]  J. Flanagan,et al.  Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[47]  Mark A. Clements,et al.  Speech concatenation and synthesis using an overlap-add sinusoidal model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[48]  Henry Leung,et al.  Prediction of noisy chaotic time series using an optimal radial basis function neural network , 2001, IEEE Trans. Neural Networks.

[49]  Alan W. Black,et al.  CHATR: a generic speech synthesis system , 1994, COLING.

[50]  John Makhoul,et al.  Adaptive lattice analysis of speech , 1981 .

[51]  Y. Sagisaka,et al.  Speech synthesis by rule using an optimal selection of non-uniform synthesis units , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[52]  B. Atal,et al.  Role of multi-pulse excitation in synthesis of natural-sounding voiced speech , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[54]  Richard J. Povinelli,et al.  Speech recognition using reconstructed phase space features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[55]  Sverre Holm Automatic generation of mixed excitation in a linear predictive speech synthesizer , 1981, ICASSP.

[56]  Gérard Bailly,et al.  The Cost258 Signal Generation Test Array , 2000, LREC.

[57]  G. Kubin,et al.  A multi-band nonlinear oscillator model for speech , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[58]  Erhard Rank,et al.  Combining non-uniform unit selection with diphone based synthesis , 2003, INTERSPEECH.

[59]  Steve McLaughlin,et al.  Stable speech synthesis using recurrent radial basis functions , 1999, EUROSPEECH.

[60]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[61]  Xavier Serra,et al.  A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .

[62]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[63]  Attila Ferencz,et al.  The new version of the ROMVOX text-to-speech synthesis system based on a hybrid time domain-LPC synthesis technique , 1998, ICSLP.

[64]  Steve McLaughlin,et al.  Dynamical modelling of vowel sounds as a synthesis tool , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[65]  I. Titze,et al.  Voice simulation with a body-cover model of the vocal folds. , 1995, The Journal of the Acoustical Society of America.

[66]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[67]  Dik J. Hermes,et al.  Synthesis of breathy vowels: Some research methods , 1991, Speech Commun..

[68]  Jean Schoentgen,et al.  Non-linear signal representation and its application to the modelling of the glottal waveform , 1990, Speech Commun..

[69]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[70]  Steve McLaughlin,et al.  Speech characterization and synthesis by nonlinear methods , 1999, IEEE Trans. Speech Audio Process..

[71]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[72]  Arild Lacroix,et al.  Generation of nasalized speech sounds based on branched tube models obtained from separate mouth and nose outputs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[73]  W. Bastiaan Kleijn,et al.  A speech coder based on decomposition of characteristic waveforms , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[74]  B. Townshend,et al.  Nonlinear prediction of speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[75]  Carmen Peláez-Moreno,et al.  Backward adaptive RBF-based hybrid predictors for CELP-type coders at medium bit-rates , 1999, EUROSPEECH.

[76]  B. Atal,et al.  Changing pitch and duration in LPC synthesized speech using multipulse excitation , 1983 .

[77]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[78]  Jean Schoentgen,et al.  Glottal waveform synthesis with Volterra shaping functions , 1992, Speech Commun..

[79]  Bishnu S. Atal,et al.  Speech synthesis by linear interpolation of spectral parameters between dyad boundaries , 1979 .

[80]  Simon Haykin,et al.  A dynamic regularized Gaussian radial basis function network for nonlinear, nonstationary time series prediction , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[81]  Nick Campbell,et al.  Acoustic nature and perceptual testing of corpora of emotional speech , 1998, ICSLP.

[82]  Joseph P. Olive,et al.  Speech resynthesis from phoneme-related parameters. , 1975 .

[83]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[84]  Fumitada Itakura,et al.  An Audio Response Unit Based on Partial Autocorrelation , 1972, IEEE Trans. Commun..

[85]  Gernot Kubin,et al.  Detection of chaotic behaviour in speech signals using Fraser's mutual information algorithm , 1991 .

[86]  Gernot Kubin,et al.  Nonlinear long-term prediction of speech signals , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[87]  Gernot Kubin,et al.  Performance of noise excitation for unvoiced speech , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[88]  Wolfgang Wokurek Time-frequency analysis of the glottal opening , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[89]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[90]  Celia Scully Speech production simulated with a functional model of the larynx and the vocal tract , 1986 .

[91]  S. D. Hansen,et al.  Non-linear short-term prediction in speech coding , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[92]  Pierre Badin,et al.  Vocal tract acoustics using the transmission line matrix (TLM) method , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[93]  Simon Haykin,et al.  Regularized radial basis functional networks: theory and applications , 2001 .

[94]  M. Jack,et al.  Robust F/sub 0/ and jitter estimation in pathological voices , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[95]  J. C. Pereira Some results from the three-mass model of the larynx , 1989, Images of the Twenty-First Century. Proceedings of the Annual International Engineering in Medicine and Biology Society,.

[96]  Mike Wu,et al.  Decomposition of speech into voiced and unvoiced components based on a state-space signal model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[97]  Steve McLaughlin,et al.  A nonlinear algorithm for epoch marking in speech signals using poincare maps , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[98]  P.J.B. Jackson,et al.  Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data , 2000 .

[99]  Rnj Raymond Veldhuis,et al.  A symmetrical two-mass vocal-fold model coupled to vocal tract and trachea, with application to prosthesis design , 1998 .

[100]  Marcelo de Oliveira Rosa,et al.  Adaptive estimation of residue signal for voice pathology diagnosis , 2000, IEEE Trans. Biomed. Eng..

[101]  David G. Messerschmitt,et al.  Adaptive Filters: Structures, Algorithms and Applications , 1984 .

[102]  Michael W. Macon,et al.  A perceptual evaluation of distance measures for concatenative speech synthesis , 1998, ICSLP.

[103]  Gernot Kubin,et al.  Synthesis and coding of continuous speech with the nonlinear oscillator model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[104]  Willem Bastiaan Kleijn,et al.  Time-scale modification of speech based on a nonlinear oscillator model , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[105]  W.B. Kleijn,et al.  Transformation and decomposition of the speech signal for coding , 1994, IEEE Signal Processing Letters.

[106]  Celia Scully,et al.  The representation of stored plans for articulatory coordination and constraints in a composite model of speech production , 1983, Speech Commun..

[107]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[108]  Petros Maragos,et al.  Speech analysis and feature extraction using chaotic models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[109]  Attila Ferencz,et al.  On a hybrid time domain-LPC technique for prosody superimposing used for speech synthesis , 1999, EUROSPEECH.

[110]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[111]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[112]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[113]  Federico Avanzini,et al.  Model-based synthesis and transformation of voiced sounds , 2000 .

[114]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[115]  Gérard Bailly A Parametric Harmonic + Noise Model , 2002 .

[116]  Tohru Ifukube,et al.  Two 1/f fluctuations in sustained phonation and their roles on naturalness of synthetic voice , 1996, Proceedings of Third International Conference on Electronics, Circuits, and Systems.

[117]  Angela D. Friederici,et al.  On the relations of semantic and acoustic properties of emotions , 1999 .

[118]  Jean Schoentgen,et al.  An algorithm for the measurement of jitter , 1991, Speech Commun..

[119]  Iain Mann,et al.  An investigation of nonlinear speech synthesis and pitch modification techniques , 2000 .

[120]  Bishnu S. Atal,et al.  Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[121]  Thierry Dutoit,et al.  MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database , 1993, Speech Commun..

[122]  Thierry Dutoit,et al.  From MBROLA to NU-MBROLA , 2001, SSW.

[123]  Jmb Jacques Terken Variability and Speaking Styles in Speech Synthesis , 2002 .

[124]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[125]  Erhard Rank Concatenative Speech Synthesis Using SRELP , 2002 .

[126]  Dmitry E. Terez,et al.  Robust pitch determination using nonlinear state-space embedding , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[127]  Steve McLaughlin,et al.  Synthesising natural-sounding vowels using a nonlinear dynamical model , 2001, Signal Process..

[128]  J. Locke,et al.  Learning to speak , 1993 .

[129]  E. Keller Improvements in speech synthesis : COST 258, the naturalness of synthetic speech , 2002 .