A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis
暂无分享,去创建一个
Bajibabu Bollepalli | Lauri Juvela | Paavo Alku | Junichi Yamagishi | Manu Airaksinen | J. Yamagishi | Lauri Juvela | P. Alku | B. Bollepalli | Manu Airaksinen | Bajibabu Bollepalli
[1] Heiga Zen,et al. Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.
[2] Lauri Juvela,et al. Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort , 2014, INTERSPEECH.
[3] Susan Fitt,et al. On generating combilex pronunciations via morphological analysis , 2010, INTERSPEECH.
[4] Paavo Alku,et al. Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..
[5] Keiichi Tokuda,et al. Incorporating a mixed excitation model and postfilter into HMM-based text-to-speech synthesis , 2005, Systems and Computers in Japan.
[6] Cassia Valentini-Botinhao,et al. Hurricane natural speech corpus , 2013 .
[7] Simon King,et al. The listening talker: A review of human and algorithmic context-induced modifications of speech , 2014, Comput. Speech Lang..
[8] Vincent Pollet,et al. Uniform Speech Parameterization for Multi-Form Segment Synthesis , 2011, INTERSPEECH.
[9] Mark J. F. Gales,et al. A Pulse Model in Log-domain for a Uniform Synthesizer , 2016, SSW.
[10] Zhizheng Wu,et al. Merlin: An Open Source Neural Network Speech Synthesis System , 2016, SSW.
[11] Yves Kamp,et al. Robust signal selection for linear prediction analysis of voiced speech , 1993, Speech Commun..
[12] Bajibabu Bollepalli,et al. High-pitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Slava Shechtman,et al. Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities , 2017, INTERSPEECH.
[14] Samy Bengio,et al. Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model , 2017, ArXiv.
[15] Paavo Alku,et al. Effects of Training Data Variety in Generating Glottal Pulses from Acoustic Features with DNNs , 2017, INTERSPEECH.
[16] Thomas P. Barnwell,et al. MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .
[17] Bajibabu Bollepalli,et al. Glottal Vocoding With Frequency-Warped Time-Weighted Linear Prediction , 2017, IEEE Signal Processing Letters.
[18] Tom Bäckström,et al. Speech Coding: with Code-Excited Linear Prediction , 2017 .
[19] F. Itakura. Line spectrum representation of linear predictor coefficients of speech signals , 1975 .
[20] Logan Volkers,et al. PHASE VOCODER , 2008 .
[21] Hirokazu Kameoka,et al. Generative adversarial network-based postfilter for statistical parametric speech synthesis , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[23] Yannis Stylianou,et al. Voice Transformation: A survey , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[24] Daniel Erro,et al. A uniform phase representation for the harmonic model in speech synthesis applications , 2014, EURASIP J. Audio Speech Music. Process..
[25] Bhuvana Ramabhadran,et al. Bias and Statistical Significance in Evaluating Speech Synthesis with Mean Opinion Scores , 2017, INTERSPEECH.
[26] P. Alku,et al. Formant frequency estimation of high-pitched vowels using weighted linear prediction. , 2013, The Journal of the Acoustical Society of America.
[27] Unto K. Laine,et al. A comparison of warped and conventional linear predictive coding , 2001, IEEE Trans. Speech Audio Process..
[28] Paavo Alku,et al. Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise , 2014, Comput. Speech Lang..
[29] Junichi Yamagishi,et al. An experimental comparison of multiple vocoder types , 2013, SSW.
[30] Keiichi Tokuda,et al. Speech synthesis using HMMs with dynamic features , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[31] 서종수,et al. 四季 引 festival , 2009 .
[32] Paavo Alku,et al. Quasi Closed Phase Glottal Inverse Filtering Analysis With Weighted Linear Prediction , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[34] Perry R. Cook,et al. Toward the Perfect Audio Morph? Singing Voice Synthesis and Processing , 2007 .
[35] Paavo Alku,et al. Wideband Parametric Speech Synthesis Using Warped Linear Prediction , 2012, INTERSPEECH.
[36] Paavo Alku,et al. HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[37] H. Strube. Linear prediction on a warped frequency scale , 1980 .
[38] Junichi Yamagishi,et al. Glottal Spectral Separation for Speech Synthesis , 2014, IEEE Journal of Selected Topics in Signal Processing.
[39] David Talkin,et al. A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .
[40] Lauri Juvela,et al. Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks , 2016, INTERSPEECH.
[41] Hideki Kawahara,et al. Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT , 2001, MAVEBA.
[42] Nicolas Sturmel,et al. Phase-Based Methods for Voice Source Analysis , 2007, NOLISP.
[43] METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .
[44] Paavo Alku,et al. Comparison of formant enhancement methods for HMM-based speech synthesis , 2010, SSW.
[45] Julius O. Smith,et al. Bark and ERB bilinear transforms , 1999, IEEE Trans. Speech Audio Process..
[46] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[47] S. King,et al. The Blizzard Challenge 2011 , 2011 .
[48] Eric Moulines,et al. Non-parametric techniques for pitch-scale and time-scale modification of speech , 1995, Speech Commun..
[49] Keiichi Tokuda,et al. Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.
[50] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] Zhizheng Wu,et al. Investigating gated recurrent neural networks for speech synthesis , 2016, ArXiv.
[52] Inma Hernáez,et al. Harmonics Plus Noise Model Based Vocoder for Statistical Parametric Speech Synthesis , 2014, IEEE Journal of Selected Topics in Signal Processing.
[53] Bajibabu Bollepalli,et al. GlottDNN - A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis , 2016, INTERSPEECH.
[54] John G Harris,et al. A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.
[55] Thierry Dutoit,et al. The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[56] Paavo Alku,et al. Voice source modelling using deep neural networks for statistical parametric speech synthesis , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).
[57] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[58] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..