Revisiting spectral envelope recovery from speech sounds generated by periodic excitation
暂无分享,去创建一个
[1] I. Titze. Nonlinear source-filter coupling in phonation: theory. , 2008, The Journal of the Acoustical Society of America.
[2] HIDEKI KAWAHARA,et al. Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework , 2011 .
[3] F. Itakura,et al. A statistical method for estimation of speech spectral density and formant frequencies , 1970 .
[4] Masanori Morise,et al. Sound quality comparison among high-quality vocoders by using re-synthesized speech , 2018 .
[5] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[6] Matti Karjalainen,et al. Reverberation Modeling Using Velvet Noise , 2007 .
[7] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..
[8] L. H. Anauer,et al. Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .
[9] Hideki Kawahara,et al. Inharmonic speech reveals the role of harmonicity in the cocktail party problem , 2018, Nature Communications.
[10] Tomoki Toda,et al. Frequency domain variants of velvet noise and their application to speech processing and synthesis: with appendices , 2018, INTERSPEECH.
[11] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[12] M. Unser. Sampling-50 years after Shannon , 2000, Proceedings of the IEEE.
[13] Alan W. Black,et al. The CMU Arctic speech databases , 2004, SSW.
[14] Anders Löfqvist,et al. Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization. , 2015, The Journal of the Acoustical Society of America.
[15] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[16] S. Hayashi,et al. Design and description of CS-ACELP: a toll quality 8 kb/s speech coder , 1998, IEEE Trans. Speech Audio Process..
[17] Tomoki Toda,et al. A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis , 2017, INTERSPEECH.
[18] Masanori Morise,et al. CheapTrick, a spectral envelope estimator for high-quality speech synthesis , 2015, Speech Commun..
[19] Gunnar Fant,et al. Acoustic Theory Of Speech Production , 1960 .
[20] Satoshi Imai,et al. Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.
[21] Yannis Stylianou,et al. Analysis and Synthesis of Speech Using an Adaptive Full-Band Harmonic Model , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[22] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[23] Eric Moulines,et al. High-quality speech modification based on a harmonic + noise model , 1995, EUROSPEECH.
[24] Morise Masanori,et al. Acoustic measurements using a frequency domain velvet noise and interference-free power spectral representations of periodic sounds , 2018 .
[25] Amro El-Jaroudi,et al. Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..
[26] D G Childers,et al. Modeling the glottal volume-velocity waveform for three voice types. , 1995, The Journal of the Acoustical Society of America.
[27] Hideki Kawahara,et al. Temporally variable multi-aspect N-way morphing based on interference-free speech representations , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.
[28] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[29] Karen Simonyan,et al. Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders , 2017, ICML.
[30] Hideki Kawahara,et al. Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT , 2005, INTERSPEECH.
[31] J W Hawks,et al. A formant bandwidth estimation procedure for vowel synthesis [43.72.Ja]. , 1995, The Journal of the Acoustical Society of America.
[32] A. Oppenheim. Speech analysis-synthesis system based on homomorphic filtering. , 1969, The Journal of the Acoustical Society of America.
[33] Hideki Kawahara,et al. Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[34] Hideki Kawahara,et al. Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[35] J. Bonada,et al. Synthesis of the Singing Voice by Performance Sampling and Spectral Models , 2007, IEEE Signal Processing Magazine.
[36] Manfred R. Schroeder,et al. Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[37] Michael Unser,et al. Splines: a perfect fit for signal and image processing , 1999, IEEE Signal Process. Mag..
[38] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[39] Vesa Välimäki,et al. A Perceptual Study on Velvet Noise and Its Variants at Different Pulse Densities , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[40] Thierry Dutoit,et al. The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications , 2012, IEEE Transactions on Audio, Speech, and Language Processing.