暂无分享,去创建一个
Srikanth Ronanki | Thomas Merritt | Jaime Lorenzo-Trueba | Thomas Drugman | Roberto Barra-Chicote | Nishant Prateek | Mateusz Lajszczak | Trevor Wood | R. Barra-Chicote | Jaime Lorenzo-Trueba | Thomas Drugman | Thomas Merritt | S. Ronanki | Mateusz Lajszczak | N. Prateek | T. Wood
[1] Srikanth Ronanki,et al. Effect of Data Reduction on Sequence-to-sequence Neural TTS , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[3] Nick Campbell,et al. Optimising selection of units from speech databases for concatenative synthesis , 1995, EUROSPEECH.
[4] Zhizheng Wu,et al. Deep neural network-guided unit selection synthesis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Tomohiro Nakatani,et al. A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments , 2008, Speech Commun..
[6] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[7] Zhi-Jie Yan,et al. A Unified Trajectory Tiling Approach to High Quality Speech Rendering , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[8] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[9] Yannis Agiomyrgiannakis,et al. Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders , 2017, INTERSPEECH.
[10] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[11] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[12] Adam Nadolski,et al. Comprehensive Evaluation of Statistical Speech Waveform Synthesis , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[14] Paul Taylor,et al. The target cost formulation in unit selection speech synthesis , 2006, INTERSPEECH.
[15] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[16] Ming Zhou,et al. Close to Human Quality TTS with Transformer , 2018, ArXiv.
[17] Sanjoy Dasgupta,et al. Adaptive Control Processes , 2010, Encyclopedia of Machine Learning and Data Mining.
[18] Christian S. Perone,et al. Evaluation of sentence embeddings in downstream and linguistic probing tasks , 2018, ArXiv.
[19] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[20] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[21] Xin Wang,et al. Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis , 2018, ArXiv.
[22] Yuxuan Wang,et al. Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[23] Sercan Ömer Arik,et al. Neural Voice Cloning with a Few Samples , 2018, NeurIPS.
[24] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[26] Heiga Zen,et al. Sample Efficient Adaptive Text-to-Speech , 2018, ICLR.
[27] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[28] Bajibabu Bollepalli,et al. Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention , 2018, ArXiv.
[29] Peter L. Søndergaard,et al. A fast Griffin-Lim algorithm , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[30] Yuxuan Wang,et al. Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Yutaka Matsuo,et al. Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.
[32] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[33] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[34] Thomas Drugman,et al. Robust universal neural vocoding , 2018, ArXiv.
[35] Luke S. Zettlemoyer,et al. AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.
[36] David A. Krubsack,et al. An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech , 1991, IEEE Trans. Signal Process..