Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure Information
暂无分享,去创建一个
Adam Nadolski | Thomas Drugman | Thomas Merritt | Alexis Moinet | Roberto Barra-Chicote | Bartosz Putrycz | Viacheslav Klimkov | A. Moinet | R. Barra-Chicote | Thomas Drugman | Thomas Merritt | Bartosz Putrycz | V. Klimkov | Adam Nadolski
[1] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[2] Alan W. Black,et al. A Grammar Based Approach to Style Specific Phrase Prediction , 2011, INTERSPEECH.
[3] Mari Ostendorf,et al. A Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location , 1994, CL.
[4] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[5] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[6] David Escudero Mancebo,et al. Filled Pauses in Speech Synthesis: Towards Conversational Speech , 2007, TSD.
[7] Alok Parlikar. Style-Specific Phrasing in Speech Synthesis , 2013 .
[8] Suryakanth V. Gangashetty,et al. An Investigation of Recurrent Neural Network Architectures Using Word Embeddings for Phrase Break Prediction , 2016, INTERSPEECH.
[9] P MarcusMitchell,et al. Building a large annotated corpus of English , 1993 .
[10] Alan W. Black,et al. Data-driven phrasing for speech synthesis in low-resource languages , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[13] Mari Ostendorf,et al. TOBI: a standard for labeling English prosody , 1992, ICSLP.
[14] Oliver Watts,et al. Neural net word representations for phrase-break prediction without a part of speech tagger , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Paul Taylor,et al. Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..
[16] Antonio Bonafonte,et al. Prosodic Break Prediction with RNNs , 2016, IberSPEECH.
[17] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[18] Julia Hirschberg,et al. Automatic classification of intonational phrase boundaries , 1992 .
[19] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[20] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[21] Andrew Rosenberg,et al. AutoBI - a tool for automatic toBI annotation , 2010, INTERSPEECH.
[22] Mark Fishel,et al. Modelling the temporal structure of newsreaders' speech on neural networks for Estonian text-to-speech synthesis , 2006 .
[23] Jens Apel,et al. Have a break ! Modelling pauses in German Speech , 2004 .
[24] Hiroyuki Shindo,et al. A latent variable model for joint pause prediction and dependency parsing , 2015, INTERSPEECH.