Improving Unsupervised Style Transfer in end-to-end Speech Synthesis with end-to-end Speech Recognition
暂无分享,去创建一个
[1] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[2] Shinnosuke Takamichi,et al. Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Yoav Goldberg,et al. Sequence to Sequence Transduction with Hard Monotonic Attention , 2016, ArXiv.
[5] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[6] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[7] Erik McDermott,et al. Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[9] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[10] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Fabio Tesser,et al. Experiments with signal-driven symbolic prosody for statistical parametric speech synthesis , 2013, SSW.
[12] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Samy Bengio,et al. An Online Sequence-to-Sequence Model Using Partial Conditioning , 2015, NIPS.
[14] Takashi Nose,et al. A Style Control Technique for HMM-Based Expressive Speech Synthesis , 2007, IEICE Trans. Inf. Syst..
[15] Lauri Juvela,et al. Generative Adversarial Network-Based Glottal Waveform Model for Statistical Parametric Speech Synthesis , 2017, INTERSPEECH.
[16] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[17] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] Junichi Yamagishi,et al. Adapting and controlling DNN-based speech synthesis using input codes , 2017, ICASSP.
[19] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[20] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[22] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[23] Mark J. F. Gales,et al. Unsupervised clustering of emotion and voice styles for expressive TTS , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[25] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Colin Raffel,et al. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.
[28] Matt Shannon,et al. Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping , 2017, INTERSPEECH.
[29] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..
[30] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[31] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[32] 拓海 杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .
[33] Manfred K. Warmuth,et al. THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM , 2001 .
[34] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[35] Satoshi Nakamura,et al. Listening while speaking: Speech chain by deep learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[36] Igor Jauk,et al. Unsupervised Learning for Expressive Speech Synthesis , 2017, IberSPEECH.
[37] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[38] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[39] Li-Rong Dai,et al. Speaker verification against synthetic speech , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.
[40] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.