Generating the Voice of the Interactive Virtual Assistant
暂无分享,去创建一个
[1] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[2] Sungwon Kim,et al. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search , 2020, NeurIPS.
[3] Massimo Giustiniani,et al. A hidden Markov model approach to speech synthesis , 1989, EUROSPEECH.
[4] Gonçalo Simões,et al. Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings , 2018, ACL.
[5] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[6] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[7] Christian Dittmar,et al. A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[8] Dennis H. Klatt,et al. Software for a cascade/parallel formant synthesizer , 1980 .
[9] Shuang Liang,et al. Flow-TTS: A Non-Autoregressive Network for Text to Speech Based on Flow , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Marius Cotescu,et al. Using Vaes and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[12] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[13] Marc Schröder,et al. Open Source Voice Creation Toolkit for the MARY TTS Platform , 2011, INTERSPEECH.
[14] Bryan Catanzaro,et al. Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis , 2021, ICLR.
[15] Hema A. Murthy,et al. Natural sounding TTS based on syllable-like units , 2006, 2006 14th European Signal Processing Conference.
[16] Zhizheng Wu,et al. Merlin: An Open Source Neural Network Speech Synthesis System , 2016, SSW.
[17] M. G. Rahim,et al. Articulatory synthesis with the aid of a neural net , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[18] Bowen Zhou,et al. Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed , 2020, INTERSPEECH.
[19] Ryan Prenger,et al. Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Hung-yi Lee,et al. WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU , 2020, INTERSPEECH.
[21] Jason Taylor,et al. Enhancing Sequence-to-Sequence Text-to-Speech with Morphology , 2020, INTERSPEECH.
[22] Lei Xie,et al. A New GAN-based End-to-End TTS Training Algorithm , 2019, INTERSPEECH.
[23] Adriana Stan. RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications , 2020, INTERSPEECH.
[24] Dong Yu,et al. Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[25] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.
[26] Sungwon Kim,et al. FloWaveNet : A Generative Flow for Raw Audio , 2018, ICML.
[27] Jing Xiao,et al. MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution , 2020, ArXiv.
[28] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] D. Lim,et al. JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment , 2020, INTERSPEECH.
[30] Wei Ping,et al. Non-Autoregressive Neural Text-to-Speech , 2020, ICML.
[31] Zhen-Hua Ling,et al. Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Soroosh Mariooryad,et al. Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis , 2020, ArXiv.
[33] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[34] Petr Motlícek,et al. Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN , 2016, INTERSPEECH.
[35] Tao Qin,et al. MultiSpeech: Multi-Speaker Text to Speech with Transformer , 2020, INTERSPEECH.
[36] Junichi Yamagishi,et al. An experimental comparison of multiple vocoder types , 2013, SSW.
[37] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[38] Michael Schoeffler,et al. webMUSHRA — A Comprehensive Framework for Web-based Listening Tests , 2018 .
[39] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[40] Shuang Xu,et al. First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural Attention , 2016, INTERSPEECH.
[41] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[42] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[43] Ariya Rastrow,et al. Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion , 2019, INTERSPEECH.
[44] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Paul Taylor,et al. Festival Speech Synthesis System , 1998 .
[46] Bajibabu Bollepalli,et al. GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram , 2019, INTERSPEECH.
[47] Jacob Benesty,et al. Springer handbook of speech processing , 2007, Springer Handbooks.
[48] David B. Pisoni,et al. Text-to-speech: the mitalk system , 1987 .
[49] Oliver Watts,et al. Where do the improvements come from in sequence-to-sequence neural TTS? , 2019 .
[50] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[51] Youngik Kim,et al. VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network , 2020, INTERSPEECH.
[52] Tian Xia,et al. Aligntts: Efficient Feed-Forward Text-to-Speech System Without Explicit Alignment , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Kenneth N. Stevens,et al. A Framework for Synthesis of Segments Based on Pseudoarticulatory Parameters , 1997 .
[54] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[55] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[56] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[57] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[58] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] Cassia Valentini-Botinhao,et al. Are we using enough listeners? no! - an empirically-supported critique of interspeech 2014 TTS evaluations , 2015, INTERSPEECH.
[60] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[61] Daniel Tihelka,et al. Hybrid syllable/triphone speech synthesis , 2005, INTERSPEECH.
[62] Nam Soo Kim,et al. Reformer-TTS: Neural Speech Synthesis with Reformer Network , 2020, INTERSPEECH.
[63] Chengzhu Yu,et al. DurIAN: Duration Informed Attention Network for Speech Synthesis , 2020, INTERSPEECH.
[64] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[65] Simon King,et al. An introduction to statistical parametric speech synthesis , 2011 .
[66] Kainan Peng,et al. WaveFlow: A Compact Flow-based Model for Raw Audio , 2020, ICML.
[67] Heiga Zen,et al. Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling , 2020, ArXiv.
[68] Brian Roark,et al. Neural Models of Text Normalization for Speech Applications , 2019, Computational Linguistics.
[69] Junichi Yamagishi,et al. Emotion transplantation through adaptation in HMM-based speech synthesis , 2015, Comput. Speech Lang..
[70] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[71] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[72] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[73] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[74] Adriana Stan,et al. Deep Learning for Automatic Diacritics Restoration in Romanian , 2019, 2019 IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP).
[75] Heiga Zen,et al. Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..
[76] Zhi-Jie Yan,et al. A Unified Trajectory Tiling Approach to High Quality Speech Rendering , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[77] Takashi Saito,et al. High-quality speech synthesis using context-dependent syllabic units , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[78] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[79] Alex Acero,et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .
[80] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[81] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[82] R. H. Stetson. Motor phonetics : a study of speech movements in action , 1951 .
[83] Heiga Zen,et al. Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[84] Alan W. Black,et al. CMU Wilderness Multilingual Speech Dataset , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[85] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[86] Paul Taylor,et al. Text-to-Speech Synthesis , 2009 .
[87] Zhen-Hua Ling,et al. The use of articulatory movement data in speech synthesis applications: An overview — Application of articulatory movements using machine learning algorithms — , 2015 .
[88] Xin Wang,et al. End-to-End Text-to-Speech Using Latent Duration Based on VQ-VAE , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[89] Mireia Farrús,et al. Naturalness Enhancement with Linguistic Information in End-to-End TTS Using Unsupervised Parallel Encoding , 2020, INTERSPEECH.
[90] Simon King,et al. Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[91] Nick Campbell,et al. Optimising selection of units from speech databases for concatenative synthesis , 1995, EUROSPEECH.
[92] Tanya Lambert,et al. A database design for a TTS synthesis system using lexical diphones , 2004, INTERSPEECH.
[93] Kurt Keutzer,et al. SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis , 2020, ArXiv.
[94] Yu Zhang,et al. Learning Latent Representations for Speech Generation and Transformation , 2017, INTERSPEECH.
[95] Geoffrey E. Hinton. Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.
[96] Adam Finkelstein,et al. Fftnet: A Real-Time Speaker-Dependent Neural Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[97] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[98] Zhiyong Wu,et al. Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis , 2019, INTERSPEECH.
[99] Tomoki Toda,et al. Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).