Slovak Unit-Selection Speech Synthesis: Creating a New Slovak Voice within a Czech TTS System ARTIC

ARTIC (Artificial Talker in Czech) is a corpus- based text-to-speech (TTS) system that enables to synthesise an arbitrary text, mainly for the Czech language. Basically, two versions of ARTIC are available—a single unit instance system (also known as fixed-inventory synthesis) with the quality of resulting speech limited by the fixed inventory, and multiple unit instance system with the quality profitting from employing a unit-selection algorithm to select the longest suitable sequence of phonetic units from many units available. In this paper, a process of building a new Slovak voice for the unit-selection version of ARTIC is presented. All steps in the design, from the preparation of a suitable speech corpus to the creation of an acoustic unit inventory of the new Slovak voice and its use in the ARTIC system will be described. Text processing module, including automatic phonetic transcription and symbolic prosodic description of an arbitrary Slovak text, will be detailed. Finally, speech production module based on the unit selection algorithm will be mentioned as well.

[1]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[2]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Daniel Tihelka,et al.  Sentence boundary detection in Czech TTS system using neural networks , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[4]  J. Matoušek,et al.  On the detection of pitch marks using a robust multi-phase algorithm , 2011, Speech Commun..

[5]  William J. Byrne,et al.  The development of ASR for Slavic languages in the MALACH project , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Joseph P. Olive,et al.  Text-to-speech synthesis , 1995, AT&T Technical Journal.

[7]  Jindrich Matousek,et al.  Identifying Concatenation Discontinuities by Hierarchical Divisive Clustering of Pitch Contours , 2011, TSD.

[8]  Ludek Müller,et al.  Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization , 2005, TSD.

[9]  Daniel Tihelka,et al.  Slovak Text-to-Speech Synthesis in ARTIC System , 2004, TSD.

[10]  Jan Zelinka,et al.  Automatic Transcription of Numerals in Inflectional Languages , 2005, TSD.

[11]  Josef Psutka New Slovak Unit-Selection Speech Synthesis in ARTIC TTS System , 2011 .

[12]  Daniel Tihelka,et al.  Current State of Czech Text-to-Speech System ARTIC , 2006, TSD.

[13]  Olivier Boëffard,et al.  Design of an optimal continuous speech database for text-to-speech synthesis considered as a set covering problem , 2001, INTERSPEECH.

[14]  Daniel Tihelka,et al.  Exploring automatic similarity measures for unit selection tuning , 2009, INTERSPEECH.

[15]  Zdenek Hanzlícek Czech HMM-Based Speech Synthesis , 2010, TSD.

[16]  Daniel Tihelka,et al.  Pitch Marks at Peaks or Valleys? , 2007, TSD.

[17]  Vlasta Radová,et al.  Recording and Annotation of the Czech Speech Corpus , 2000, TSD.

[18]  Jindrich Matousek Automatic Pitch-Synchronous Phonetic Segmentation with Context-Independent HMMs , 2009, TSD.

[19]  Jindrich Matousek,et al.  Analysis of Data Collected in Listening Tests for the Purpose of Evaluation of Concatenation Cost Functions , 2011, TSD.

[20]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[21]  Daniel Tihelka,et al.  Enhancements of viterbi search for fast unit selection synthesis , 2010, INTERSPEECH.

[22]  William J. Byrne,et al.  Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project , 2005, INTERSPEECH.

[23]  Daniel Tihelka,et al.  Building of a Speech Corpus Optimised for Unit Selection TTS Synthesis , 2008, LREC.

[24]  Daniel Tihelka,et al.  Unit selection and its relation to symbolic prosody: a new approach , 2006, INTERSPEECH.

[25]  Jindrich Matousek,et al.  Recording and Annotation of Speech Corpus for Czech Unit Selection Speech Synthesis , 2007, TSD.

[26]  Thierry Dutoit Corpus-Based Speech Synthesis , 2008 .

[27]  Nam Soo Kim,et al.  On Using Multiple Models for Automatic Speech Segmentation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  John H. Smith Preparation of Papers for the IAENG International Journal of Computer Science , 2009 .

[29]  Daniel Tihelka,et al.  Experiments with Automatic Segmentation for Czech Speech Synthesis , 2003, TSD.

[30]  Jan Romportl Prosodic Phrases and Semantic Accents in Speech Corpus for Czech TTS Synthesis , 2008, TSD.

[31]  Daniel Tihelka,et al.  A robust multi-phase pitch-mark detection algorithm , 2007, INTERSPEECH.

[32]  Ludek Müller,et al.  Using the Lemmatization Technique for Phonetic Transcription in Text-to-Speech System , 2004, TSD.

[33]  Jindrich Matousek,et al.  On building phonetically and prosodically rich speech corpus for text-to-speech synthesis , 2006, Computational Intelligence.

[34]  Milos Cernak,et al.  Slovak Speech Database for Experiments and Application Building in Unit-Selection Speech Synthesis , 2004, TSD.

[35]  Jindrich Matousek,et al.  Automatic pitch-synchronous phonetic segmentation , 2008, INTERSPEECH.

[36]  Daniel Tihelka,et al.  Advanced Prosody Modelling , 2004, TSD.

[37]  Daniel Tihelka,et al.  Generalized Non-uniform Time Scaling Distribution Method for Natural-Sounding Speech Rate Change , 2011, TSD.

[38]  Daniel Tihelka,et al.  German and Czech Speech Synthesis Using HMM-Based Speech Segment Database , 2002, TSD.

[39]  Jindrich Matousek,et al.  Formal Prosodic Structures and Their Application in NLP , 2005, TSD.