Data-driven phrasing for speech synthesis in low-resource languages

We present an approach to build phrase break prediction models when synthesizing text in low resource languages. This method allows building models without depending on the availability of part of speech taggers, or corpus with hand annotated breaks. We use the same speech data used for building a synthetic voice, to deduce acoustic phrase breaks. We perform unsupervised part of speech induction over a small text corpus in the language at hand. We use these tags and train a grammar based phrasing model. In this paper, we show results for the languages: English, Portuguese and Marathi, which suggest that we can quickly build very reasonable phrasing models for new languages using very little data.

[1]  Tamás Váradi,et al.  MARSEC: A Machine-Readable Spoken English Corpus , 1993, Journal of the International Phonetic Association.

[2]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[3]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[4]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[5]  Alan W. Black,et al.  A Grammar Based Approach to Style Specific Phrase Prediction , 2011, INTERSPEECH.

[6]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[7]  L. Guibas,et al.  Finding color and shape patterns in images , 1999 .

[8]  Kishore Prahallad,et al.  Sub-Phonetic Modeling For Capturing Pronunciation Variations For Conversational Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Paul Taylor,et al.  Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[10]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..