Graphspeech: Syntax-Aware Graph Attention Network for Neural Speech Synthesis
暂无分享,去创建一个
Berrak Sisman | Rui Liu | Haizhou Li | Haizhou Li | Berrak Sisman | Rui Liu
[1] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.
[2] Hai Zhao,et al. Global Greedy Dependency Parsing , 2020, AAAI.
[3] Haizhou Li,et al. Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[4] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[5] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[6] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[7] Haizhou Li,et al. Teacher-Student Training For Robust Tacotron-Based TTS , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Jing Xiao,et al. GraphTTS: Graph-to-Sequence Modelling in Neural Text-to-Speech , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Zhuosheng Zhang,et al. SG-Net: Syntax-Guided Machine Reading Comprehension , 2019, AAAI.
[10] Lei Xie,et al. Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis , 2020, INTERSPEECH.
[11] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[12] Berrak Sisman,et al. Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Shinnosuke Takamichi,et al. Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis , 2020, Speech Commun..
[14] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[15] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[16] Deng Cai,et al. Graph Transformer for Graph-to-Sequence Learning , 2019, AAAI.
[17] Haizhou Li,et al. Expressive TTS Training with Frame and Style Reconstruction Loss , 2020, ArXiv.
[18] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[19] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[20] Haizhou Li,et al. Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion , 2018, INTERSPEECH.
[21] Haizhou Li,et al. WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss , 2020, ArXiv.
[22] Frank K. Soong,et al. Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Christopher D. Manning,et al. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.
[24] Hui Zhang,et al. A LSTM Approach with Sub-Word Embeddings for Mongolian Phrase Break Prediction , 2018, COLING.
[25] Lei Xie,et al. On the localness modeling for the self-attention based end-to-end speech synthesis , 2020, Neural Networks.
[26] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[27] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Hui Zhang,et al. Improving Mongolian Phrase Break Prediction by Using Syllable and Morphological Embeddings with BiLSTM Model , 2018, INTERSPEECH.
[29] Yuji Matsumoto. MaltParser: A language-independent system for data-driven dependency parsing , 2005 .
[30] Samy Bengio,et al. Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model , 2017, ArXiv.
[31] Simon King,et al. An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[32] Haizhou Li,et al. Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS , 2020, IEEE Signal Processing Letters.
[33] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[34] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[35] Heiga Zen,et al. Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.
[36] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[37] Heiga Zen,et al. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).