论文信息 - Oscillator-plus-Noise Modeling of Speech Signals - 字舞流文

Oscillator-plus-Noise Modeling of Speech Signals

Erhard Rank | E. Rank

[1] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.

[2] J. O. Smith,et al. Estimating glottal aspiration noise via wavelet thresholding and best-basis thresholding , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[3] Gernot Kubin,et al. Nonlinear Synthesis of Vowels in the LP Residual Domain with a Regularized RBF Network , 2001, IWANN.

[4] Sharad Singhal,et al. Intelligibility as a function of speech coding method for template-based speech synthesis , 1993, EUROSPEECH.

[5] Holger Kantz,et al. Practical implementation of nonlinear time series methods: The TISEAN package. , 1998, Chaos.

[6] James J. Carroll,et al. Approximation of nonlinear systems with radial basis function neural networks , 2001, IEEE Trans. Neural Networks.

[7] Christopher J. Zarowski. Limitations on SNR estimator accuracy , 2002, IEEE Trans. Signal Process..

[8] Nick Campbell. Prosody and the selection of units for concatenation synthesis , 1994, SSW.

[9] Michael E. Tipping. Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[10] T. Sauer. A noise reduction method for signals from nonlinear systems , 1992 .

[11] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[12] Raymond N. J. Veldhuis,et al. The effect of speech melody on voice quality , 2001, Speech Commun..

[13] Josef Heiler. Optimized frame selection for variable frame rate synthesis , 1982, ICASSP.

[14] Gernot Kubin,et al. An oscillator-plus-noise model for speech synthesis , 2006, Speech Commun..

[15] Jean Schoentgen,et al. Predictable and random components of jitter , 1997, Speech Commun..

[16] Erhard Rank,et al. Application of Bayesian trained RBF networks to nonlinear time-series modeling , 2003, Signal Process..

[17] C. Adnene,et al. Analysis of pathological voices by speech processing , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[18] A. Rosenberg. Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[19] Floris Takens,et al. On the numerical determination of the dimension of an attractor , 1985 .

[20] Jack J. Jiang,et al. Chaotic vibration induced by turbulent noise in a two-mass model of vocal folds. , 2002, The Journal of the Acoustical Society of America.

[21] Joseph Olive,et al. A scheme for concatenating units for speech synthesis , 1980, ICASSP.

[22] Raymond N. J. Veldhuis,et al. Reducing audible spectral discontinuities , 2001, IEEE Trans. Speech Audio Process..

[23] Kurt Hornik,et al. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[24] Kevin Judd,et al. Embedding as a modeling problem , 1998 .

[25] Gérard Bailly,et al. A three-dimensional linear articulatory model based on MRI data , 1998, ICSLP.

[26] J. Pereira. AC analysis of the three-mass model of the larynx , 1988, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[27] J. Friedman. Multivariate adaptive regression splines , 1990 .

[28] Y. Stylianou,et al. Decomposition of speech signals into a deterministic and a stochastic part , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29] Marc C. Beutnagel,et al. The AT & T NEXT-GEN TTS system , 1999 .

[30] Jorge C. Lucero,et al. Dynamics of the two‐mass model of the vocal folds: Equilibria, bifurcations, and oscillation region , 1993 .

[31] Tomaso A. Poggio,et al. Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[32] Paavo Alku,et al. Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[33] Donald G. Childers,et al. Glottal source modeling for voice conversion , 1995, Speech Commun..

[34] José Carlos Príncipe,et al. The gamma model--A new neural model for temporal processing , 1992, Neural Networks.

[35] David J. C. MacKay,et al. Bayesian Interpolation , 1992, Neural Computation.

[36] L. Tsimring,et al. The analysis of observed chaotic data in physical systems , 1993 .

[37] D. Mitchell Wilkes,et al. Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk , 2004, IEEE Transactions on Biomedical Engineering.

[38] Perry R. Cook,et al. SPASM, a Real-Time Vocal Tract Physical Model Controller; and Singer, the Companion Software Synthesis System , 1993 .

[39] Michael W. Macon,et al. Control of spectral dynamics in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[40] Francis C. Moon,et al. Chaotic and fractal dynamics , 1992 .

[41] Nick Campbell,et al. Objective distance measures for assessing concatenative speech synthesis , 1999, EUROSPEECH.

[42] Marcos Faúndez-Zanuy,et al. A Comparative Study Between Linear and Nonlinear Speech Prediction , 1997, IWANN.

[43] I. Titze. The physics of small-amplitude oscillation of the vocal folds. , 1988, The Journal of the Acoustical Society of America.

[44] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[45] Nick Campbell,et al. Optimising unit selection with voice source and formants in the CHATR speech synthesis system , 1997, EUROSPEECH.

[46] J. Flanagan,et al. Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[47] Mark A. Clements,et al. Speech concatenation and synthesis using an overlap-add sinusoidal model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[48] Henry Leung,et al. Prediction of noisy chaotic time series using an optimal radial basis function neural network , 2001, IEEE Trans. Neural Networks.

[49] Alan W. Black,et al. CHATR: a generic speech synthesis system , 1994, COLING.

[50] John Makhoul,et al. Adaptive lattice analysis of speech , 1981 .

[51] Y. Sagisaka,et al. Speech synthesis by rule using an optimal selection of non-uniform synthesis units , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[52] B. Atal,et al. Role of multi-pulse excitation in synthesis of natural-sounding voiced speech , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53] Wolfgang Hess,et al. Pitch Determination of Speech Signals , 1983 .

[54] Richard J. Povinelli,et al. Speech recognition using reconstructed phase space features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[55] Sverre Holm. Automatic generation of mixed excitation in a linear predictive speech synthesizer , 1981, ICASSP.

[56] Gérard Bailly,et al. The Cost258 Signal Generation Test Array , 2000, LREC.

[57] G. Kubin,et al. A multi-band nonlinear oscillator model for speech , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[58] Erhard Rank,et al. Combining non-uniform unit selection with diphone based synthesis , 2003, INTERSPEECH.

[59] Steve McLaughlin,et al. Stable speech synthesis using recurrent radial basis functions , 1999, EUROSPEECH.

[60] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[61] Xavier Serra,et al. A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .

[62] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[63] Attila Ferencz,et al. The new version of the ROMVOX text-to-speech synthesis system based on a hybrid time domain-LPC synthesis technique , 1998, ICSLP.

[64] Steve McLaughlin,et al. Dynamical modelling of vowel sounds as a synthesis tool , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[65] I. Titze,et al. Voice simulation with a body-cover model of the vocal folds. , 1995, The Journal of the Acoustical Society of America.

[66] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[67] Dik J. Hermes,et al. Synthesis of breathy vowels: Some research methods , 1991, Speech Commun..

[68] Jean Schoentgen,et al. Non-linear signal representation and its application to the modelling of the glottal waveform , 1990, Speech Commun..

[69] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[70] Steve McLaughlin,et al. Speech characterization and synthesis by nonlinear methods , 1999, IEEE Trans. Speech Audio Process..

[71] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[72] Arild Lacroix,et al. Generation of nasalized speech sounds based on branched tube models obtained from separate mouth and nose outputs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[73] W. Bastiaan Kleijn,et al. A speech coder based on decomposition of characteristic waveforms , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[74] B. Townshend,et al. Nonlinear prediction of speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[75] Carmen Peláez-Moreno,et al. Backward adaptive RBF-based hybrid predictors for CELP-type coders at medium bit-rates , 1999, EUROSPEECH.

[76] B. Atal,et al. Changing pitch and duration in LPC synthesized speech using multipulse excitation , 1983 .

[77] John E. Markel,et al. Linear Prediction of Speech , 1976, Communication and Cybernetics.

[78] Jean Schoentgen,et al. Glottal waveform synthesis with Volterra shaping functions , 1992, Speech Commun..

[79] Bishnu S. Atal,et al. Speech synthesis by linear interpolation of spectral parameters between dyad boundaries , 1979 .

[80] Simon Haykin,et al. A dynamic regularized Gaussian radial basis function network for nonlinear, nonstationary time series prediction , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[81] Nick Campbell,et al. Acoustic nature and perceptual testing of corpora of emotional speech , 1998, ICSLP.

[82] Joseph P. Olive,et al. Speech resynthesis from phoneme-related parameters. , 1975 .

[83] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[84] Fumitada Itakura,et al. An Audio Response Unit Based on Partial Autocorrelation , 1972, IEEE Trans. Commun..

[85] Gernot Kubin,et al. Detection of chaotic behaviour in speech signals using Fraser's mutual information algorithm , 1991 .

[86] Gernot Kubin,et al. Nonlinear long-term prediction of speech signals , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[87] Gernot Kubin,et al. Performance of noise excitation for unvoiced speech , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[88] Wolfgang Wokurek. Time-frequency analysis of the glottal opening , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[89] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[90] Celia Scully. Speech production simulated with a functional model of the larynx and the vocal tract , 1986 .

[91] S. D. Hansen,et al. Non-linear short-term prediction in speech coding , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[92] Pierre Badin,et al. Vocal tract acoustics using the transmission line matrix (TLM) method , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[93] Simon Haykin,et al. Regularized radial basis functional networks: theory and applications , 2001 .

[94] M. Jack,et al. Robust F/sub 0/ and jitter estimation in pathological voices , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[95] J. C. Pereira. Some results from the three-mass model of the larynx , 1989, Images of the Twenty-First Century. Proceedings of the Annual International Engineering in Medicine and Biology Society,.

[96] Mike Wu,et al. Decomposition of speech into voiced and unvoiced components based on a state-space signal model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[97] Steve McLaughlin,et al. A nonlinear algorithm for epoch marking in speech signals using poincare maps , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[98] P.J.B. Jackson,et al. Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data , 2000 .

[99] Rnj Raymond Veldhuis,et al. A symmetrical two-mass vocal-fold model coupled to vocal tract and trachea, with application to prosthesis design , 1998 .

[100] Marcelo de Oliveira Rosa,et al. Adaptive estimation of residue signal for voice pathology diagnosis , 2000, IEEE Trans. Biomed. Eng..

[101] David G. Messerschmitt,et al. Adaptive Filters: Structures, Algorithms and Applications , 1984 .

[102] Michael W. Macon,et al. A perceptual evaluation of distance measures for concatenative speech synthesis , 1998, ICSLP.

[103] Gernot Kubin,et al. Synthesis and coding of continuous speech with the nonlinear oscillator model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[104] Willem Bastiaan Kleijn,et al. Time-scale modification of speech based on a nonlinear oscillator model , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[105] W.B. Kleijn,et al. Transformation and decomposition of the speech signal for coding , 1994, IEEE Signal Processing Letters.

[106] Celia Scully,et al. The representation of stored plans for articulatory coordination and constraints in a composite model of speech production , 1983, Speech Commun..

[107] T. Moon. The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[108] Petros Maragos,et al. Speech analysis and feature extraction using chaotic models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[109] Attila Ferencz,et al. On a hybrid time domain-LPC technique for prosody superimposing used for speech synthesis , 1999, EUROSPEECH.

[110] Douglas D. O'Shaughnessy,et al. Speech communication : human and machine , 1987 .

[111] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[112] Dennis H. Klatt,et al. Software for a cascade/parallel formant synthesizer , 1980 .

[113] Federico Avanzini,et al. Model-based synthesis and transformation of voiced sounds , 2000 .

[114] Gunnar Fant,et al. Acoustic Theory Of Speech Production , 1960 .

[115] Gérard Bailly. A Parametric Harmonic + Noise Model , 2002 .

[116] Tohru Ifukube,et al. Two 1/f fluctuations in sustained phonation and their roles on naturalness of synthetic voice , 1996, Proceedings of Third International Conference on Electronics, Circuits, and Systems.

[117] Angela D. Friederici,et al. On the relations of semantic and acoustic properties of emotions , 1999 .

[118] Jean Schoentgen,et al. An algorithm for the measurement of jitter , 1991, Speech Commun..

[119] Iain Mann,et al. An investigation of nonlinear speech synthesis and pitch modification techniques , 2000 .

[120] Bishnu S. Atal,et al. Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[121] Thierry Dutoit,et al. MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database , 1993, Speech Commun..

[122] Thierry Dutoit,et al. From MBROLA to NU-MBROLA , 2001, SSW.

[123] Jmb Jacques Terken. Variability and Speaking Styles in Speech Synthesis , 2002 .

[124] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[125] Erhard Rank. Concatenative Speech Synthesis Using SRELP , 2002 .

[126] Dmitry E. Terez,et al. Robust pitch determination using nonlinear state-space embedding , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[127] Steve McLaughlin,et al. Synthesising natural-sounding vowels using a nonlinear dynamical model , 2001, Signal Process..

[128] J. Locke,et al. Learning to speak , 1993 .

[129] E. Keller. Improvements in speech synthesis : COST 258, the naturalness of synthetic speech , 2002 .