Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
暂无分享,去创建一个
[1] Ren-Hua Wang,et al. The USTC System for Blizzard Challenge 2010 , 2008 .
[2] Hirokazu Kameoka,et al. Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis , 2017, INTERSPEECH.
[3] Yoshihiko Nankaku,et al. Impacts of input linguistic feature representation on Japanese end-to-end speech synthesis , 2019 .
[4] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[5] Mike Lewis,et al. MelNet: A Generative Model for Audio in the Frequency Domain , 2019, ArXiv.
[6] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[7] Heiga Zen,et al. Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices , 2016, INTERSPEECH.
[8] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[9] Yoshihiko Nankaku,et al. Simultaneous Acoustic, Prosodic, and Phrasing Model Training for TTs Conversion Systems , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.
[10] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[11] S. King,et al. The Blizzard Challenge 2011 , 2011 .
[12] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[13] Xin Wang,et al. Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[14] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[15] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[17] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[18] Kevin Knight,et al. Multi-Source Neural Translation , 2016, NAACL.
[19] Zhizheng Wu,et al. The Blizzard Challenge 2019 , 2019 .
[21] Jiangyan Yi,et al. Forward–Backward Decoding Sequence for Regularizing End-to-End TTS , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[22] Ying Chen,et al. Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[25] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[26] Shuang Xu,et al. First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural Attention , 2016, INTERSPEECH.
[27] Dong Yu,et al. Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Simon King,et al. The Blizzard Challenge 2008 , 2008 .
[30] Simon King,et al. A Comparison of Letters and Phones as Input to Sequence-to-Sequence Models for Speech Synthesis , 2019 .
[31] Heng Lu,et al. The Tencent speech synthesis system for Blizzard Challenge 2020 , 2020, Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020.
[32] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[33] Tomoki Toda,et al. Real-Time Neural Text-to-Speech with Sequence-to-Sequence Acoustic Model and WaveGlow or Single Gaussian WaveRNN Vocoders , 2019, INTERSPEECH.
[34] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Oliver Watts,et al. Where do the improvements come from in sequence-to-sequence neural TTS? , 2019 .
[36] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[37] Yoshua Bengio,et al. Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.
[38] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[39] Lei He,et al. Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS , 2019, INTERSPEECH.
[40] Tuomo Raitio,et al. A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[41] Zhiyong Wu,et al. Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis , 2019, INTERSPEECH.
[42] Wei Song,et al. Building a mixed-lingual neural TTS system with only monolingual data , 2019, INTERSPEECH.
[43] Shinnosuke Takamichi,et al. Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra , 2019, Comput. Speech Lang..
[44] Zhizheng Wu,et al. From HMMS to DNNS: Where do the improvements come from? , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Xin Wang,et al. Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects , 2018, INTERSPEECH.
[46] Mike Schuster,et al. Better Generative Models for Sequential Data Problems: Bidirectional Recurrent Mixture Density Networks , 1999, NIPS.
[47] Taesu Kim,et al. Learning pronunciation from a foreign language in speech synthesis networks , 2018, ArXiv.
[48] Junichi Yamagishi,et al. An autoregressive recurrent mixture density network for parametric speech synthesis , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[50] Yoshihiko Nankaku,et al. Trajectory training considering global variance for speech synthesis based on neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.
[52] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[53] Oliver Watts,et al. Unsupervised learning for text-to-speech synthesis , 2013 .
[54] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[55] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[56] Alan W. Black,et al. Flite: a small fast run-time synthesis engine , 2001, SSW.
[57] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[58] Xin Wang,et al. Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] Douglas Eck,et al. Music Transformer , 2018, 1809.04281.
[60] Soroosh Mariooryad,et al. Location-Relative Attention Mechanisms for Robust Long-Form Speech Synthesis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[61] Lauri Juvela,et al. A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[62] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[63] Heiga Zen,et al. Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning , 2019, INTERSPEECH.
[64] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[65] Wei Ping,et al. Non-Autoregressive Neural Text-to-Speech , 2020, ICML.
[66] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[67] Lei Xie,et al. A New GAN-based End-to-End TTS Training Algorithm , 2019, INTERSPEECH.
[68] Heiga Zen,et al. Hidden semi-Markov model based speech synthesis , 2004, INTERSPEECH.
[69] Paul Taylor,et al. Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..
[70] Bajibabu Bollepalli,et al. GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram , 2019, INTERSPEECH.
[71] Hirokazu Kameoka,et al. Generative adversarial network-based postfilter for statistical parametric speech synthesis , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[72] Cassia Valentini-Botinhao,et al. Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[73] Jason Taylor,et al. Analysis of Pronunciation Learning in End-to-End Speech Synthesis , 2019, INTERSPEECH.
[74] Kyubyong Park,et al. CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages , 2019, INTERSPEECH.
[75] Jürgen Schmidhuber,et al. Highway Networks , 2015, ArXiv.
[76] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[77] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[78] Zhao Song,et al. Parallel Neural Text-to-Speech , 2019, ArXiv.
[79] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.
[80] S. King,et al. Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis , 2013, SSW.
[81] Zhizheng Wu,et al. Merlin: An Open Source Neural Network Speech Synthesis System , 2016, SSW.
[82] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[83] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.
[84] Lei Xie,et al. Pre-Alignment Guided Attention for Improving Training Efficiency and Model Stability in End-to-End Speech Synthesis , 2019, IEEE Access.
[85] Miss A.O. Penney. (b) , 1974, The New Yale Book of Quotations.
[86] H. B. Mann,et al. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .
[87] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[88] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[89] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[90] Alan W. Black,et al. Issues in building general letter to sound rules , 1998, SSW.
[91] Haizhou Li,et al. Teacher-Student Training For Robust Tacotron-Based TTS , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[92] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[93] Julia Hirschberg,et al. Pitch Accent in Context: Predicting Intonational Prominence from Text , 1993, Artif. Intell..
[94] Yoshua Bengio,et al. Representation Mixing for TTS Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[95] Tara N. Sainath,et al. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[96] Junichi Yamagishi,et al. Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data , 2018, Odyssey.
[97] Yoshihiko Nankaku,et al. Temporal modeling in neural network based statistical parametric speech synthesis , 2016, SSW.
[98] B. Moore. An Introduction to the Psychology of Hearing , 1977 .
[99] Xin Wang,et al. Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis , 2019, ArXiv.
[100] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .