Text-to-Speech Synthesis Using Found Data for Low-Resource Languages

Text-to-Speech Synthesis Using Found Data for Low-Resource Languages

[1]  Simon King,et al.  Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Julia Hirschberg,et al.  Acoustic-Prosodic Indicators of Deception and Trust in Interview Dialogues , 2018, INTERSPEECH.

[3]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[4]  Julia Hirschberg,et al.  Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis , 2016, INTERSPEECH.

[5]  Julia Hirschberg,et al.  Acoustic/prosodic and lexical correlates of charismatic speech , 2005, INTERSPEECH.

[6]  Louis C. W. Pols,et al.  Frisian TTS, an example of bootstrapping TTS for minority languages , 2004, Speech Synthesis Workshop.

[7]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[8]  Junichi Yamagishi,et al.  Average-Voice-Based Speech Synthesis , 2006 .

[9]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[10]  Alan W. Black,et al.  Adaptation techniques for speech synthesis in under-resourced languages , 2010, SLTU.

[11]  Samy Bengio,et al.  Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.

[12]  Matt Post,et al.  The Language Demographics of Amazon Mechanical Turk , 2014, TACL.

[13]  R. Kubichek,et al.  Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.

[14]  Simon King,et al.  Evaluation of objective measures for intelligibility prediction of HMM-based synthetic speech in noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[16]  Oliver Watts,et al.  Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from 'found' data: evaluation and analysis , 2013, SSW.

[17]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[18]  Sabine Buchholz,et al.  Automatic Sentence Selection from Speech Corpora Including Diverse Speech for Improved HMM-TTS Synthesis Quality , 2011, INTERSPEECH.

[19]  Bogdan Orza,et al.  The SWARA speech corpus: A large parallel Romanian read speech dataset , 2017, 2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD).

[20]  Simon King,et al.  A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis , 2014, INTERSPEECH.

[21]  Zhizheng Wu,et al.  Merlin: An Open Source Neural Network Speech Synthesis System , 2016, SSW.

[22]  Julia Hirschberg,et al.  Comparing american and palestinian perceptions of charisma using acoustic-prosodic and lexical analysis , 2007, INTERSPEECH.

[23]  A.W. Black,et al.  Unit selection without a phoneme set , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[24]  Heiga Zen,et al.  Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Alan W. Black,et al.  Utterance Selection Techniques for TTS Systems Using Found Speech , 2016, SSW.

[26]  Avashna Govender,et al.  Objective measures to improve the selection of training speakers in HMM-based child speech synthesis , 2016, 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech).

[27]  Simon King,et al.  Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Nivja H. Jong,et al.  Praat script to detect syllable nuclei and measure speech rate automatically , 2009, Behavior research methods.

[29]  Mickael Rouvier,et al.  An open-source state-of-the-art toolbox for broadcast news diarization , 2013, INTERSPEECH.

[30]  Xin Wang,et al.  A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora , 2016, SSW.

[31]  Yoshua Bengio,et al.  Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.

[32]  Alan W. Black,et al.  Text to speech in new languages without a standardized orthography , 2013, SSW.