暂无分享,去创建一个
Heiga Zen | Yu Zhang | Yuan Cao | Patrick Nguyen | Yonghui Wu | Zhifeng Chen | Ye Jia | Ron J. Weiss | Yuxuan Wang | Ruoming Pang | Wei-Ning Hsu | Jonathan Shen | Z. Chen | Yonghui Wu | H. Zen | Yuan Cao | Yuxuan Wang | Yu Zhang | Ruoming Pang | Jonathan Shen | Wei-Ning Hsu | Ye Jia | Patrick Nguyen
[1] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[2] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[3] Lior Wolf,et al. Fitting New Speakers Based on a Short Untranscribed Sample , 2018, ICML.
[4] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[5] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[6] Huachun Tan,et al. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.
[7] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[8] Lars Hertel,et al. Approximate Inference for Deep Latent Gaussian Mixtures , 2016 .
[9] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[10] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[11] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[12] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[13] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[14] Yutaka Matsuo,et al. Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.
[15] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[16] Sercan Ömer Arik,et al. Neural Voice Cloning with a Few Samples , 2018, NeurIPS.
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] L. Streeter,et al. Effects of Pitch and Speech Rate on Personal Attributions , 1979 .
[19] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[20] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[21] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[23] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Hao Tang,et al. Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition , 2018, INTERSPEECH.
[25] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[26] Yu Zhang,et al. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[27] Alexander M. Rush,et al. Semi-Amortized Variational Autoencoders , 2018, ICML.
[28] Satoshi Nakamura,et al. Listening while speaking: Speech chain by deep learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[29] Tomoki Toda,et al. Back-Translation-Style Data Augmentation for end-to-end ASR , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[30] Satoshi Nakamura,et al. Machine Speech Chain with One-shot Speaker Adaptation , 2018, INTERSPEECH.
[31] John W. Black,et al. Relationships among Fundamental Frequency, Vocal Sound Pressure, and Rate of Speaking , 1961 .
[32] Murray Shanahan,et al. Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.
[33] Xin Wang,et al. Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis , 2018, ArXiv.
[34] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[35] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .
[36] Zhiting Hu,et al. Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.
[37] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[38] Yu Zhang,et al. Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.
[39] Richard M. Stern,et al. Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis , 2008, INTERSPEECH.
[40] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[41] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[43] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[44] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.