Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis
暂无分享,去创建一个
Heiga Zen | Hideki Kawahara | Yannis Agiomyrgiannakis | Hideki Kawahara | H. Zen | Yannis Agiomyrgiannakis
[1] I R Titze,et al. Perception of pitch and roughness in vocal signals with subharmonics. , 2001, Journal of voice : official journal of the Voice Foundation.
[2] Roy D. Patterson,et al. Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity , 1999, EUROSPEECH.
[3] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[4] Heiga Zen,et al. Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..
[5] G. P. Moore,et al. A model for vocal fold vibratory motion, contact area, and the electroglottogram. , 1986, The Journal of the Acoustical Society of America.
[6] J. Liljencrants,et al. Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .
[7] Abeer Alwan,et al. Reducing F0 Frame Error of F0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8] Hideki Kawahara,et al. Temporally variable multi-aspect N-way morphing based on interference-free speech representations , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.
[9] Abeer Alwan,et al. Perceptual differences among models of the voice source: Further evidence , 2014 .
[10] Ken-Ichi Sakakibara,et al. Physiological observations and synthesis of subharmonic voices , 2011 .
[11] P. Boersma. ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .
[12] D. Klatt,et al. Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.
[13] Petros Maragos,et al. On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..
[14] Hideki Kawahara,et al. Fast and Reliable F0 Estimation Method Based on the Period Extraction of Vocal Fold Vibration of Singing Voice and Speech , 2009 .
[15] D. Slepian,et al. Prolate spheroidal wave functions, fourier analysis and uncertainty — II , 1961 .
[16] H. Pollak,et al. Prolate spheroidal wave functions, fourier analysis and uncertainty — III: The dimension of the space of essentially time- and band-limited signals , 1962 .
[17] T. Abe,et al. The IF Spectrogram : A New Spectral Representation , 1997 .
[18] Daniel P. W. Ellis,et al. Noise Robust Pitch Tracking by Subband Autocorrelation Classification , 2012, INTERSPEECH.
[19] Hirokazu Kameoka,et al. Generative Modeling of Voice Fundamental Frequency Contours , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[20] A. Nuttall. Some windows with very good sidelobe behavior , 1981 .
[21] Ingo R. Titze,et al. Principles of voice production , 1994 .
[22] Masashi Unoki,et al. Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis , 2005, Speech Commun..
[23] A. L. Wang. Instantaneous and frequency-warped techniques for source separation and signal parametrization , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.
[24] Axel Röbel,et al. A multi-layer F0 model for singing voice synthesis using a b-spline representation with intuitive controls , 2015, INTERSPEECH.
[25] Hideki Kawahara,et al. Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT , 2005, INTERSPEECH.
[26] Thomas F. Quatieri,et al. A time-warping framework for speech turbulence-noise component estimation during aperiodic phonation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] D G Childers,et al. Modeling the glottal volume-velocity waveform for three voice types. , 1995, The Journal of the Acoustical Society of America.
[28] David Talkin,et al. A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .
[29] Tomoki Toda,et al. Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[30] Hiroya Fujisaki,et al. Prosody, Models, and Spontaneous Speech , 1997, Computing Prosody.
[31] John G Harris,et al. A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.
[32] Hideki Kawahara. SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free L-F Model Component , 2016, INTERSPEECH.
[33] D G Childers,et al. Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.
[34] D. Slepian. Prolate spheroidal wave functions, fourier analysis, and uncertainty — V: the discrete case , 1978, The Bell System Technical Journal.
[35] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[36] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).