论文信息 - Vergina: A Modern Greek Speech Database for Speech Synthesis

Vergina: A Modern Greek Speech Database for Speech Synthesis

The present paper outlines the Vergina speech database, which was developed in support of research and development of corpus-based unit selection and statistical parametric speech synthesis systems for Modern Greek language. In the following, we describe the design, development and implementation of the recording campaign, as well as the annotation of the database. Specifically, a text corpus of approximately 5 million words, collected from newspaper articles, periodicals, and paragraphs of literature, was processed in order to select the utterances-sentences needed for producing the speech database and to achieve a reasonable phonetic coverage. The broad coverage and contents of the selected utterances-sentences of the database ― text corpus collected from different domains and writing styles ― makes this database appropriate for various application domains. The database, recorded in audio studio, consists of approximately 3,000 phonetically balanced Modern Greek utterances corresponding to approximately four hours of speech. Annotation of the Vergina speech database was performed using task-specific tools, which are based on a hidden Markov model (HMM) segmentation method, and then manual inspection and corrections were performed.

[1] D H Klatt,et al. Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[2] Nikos Fakotakis,et al. A hybrid architecture for automatic segmentation of speech waveforms , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] Bernd Möbius. Corpus-based speech synthesis : Methods and challenges , 2000 .

[4] Alan W. Black,et al. CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling , 2006, INTERSPEECH.

[5] Zhenhua Ling. HMM-based Unit Selection Using F , 2006 .

[6] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[8] N. Iwahashi,et al. Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization , 1993 .

[9] Heiga Zen,et al. Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..

[10] P. Boersma. Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[11] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[12] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Alan W. Black,et al. Prosody and the Selection of Source Units for Concatenative Synthesis , 1997 .

[14] Paul Taylor,et al. Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[15] Paul Boersma,et al. Praat: doing phonetics by computer , 2003 .

[16] Aimilios Chalamandaris,et al. A statistical method for database reduction for embedded unit selection speech synthesis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .

[19] Panagiotis Zervas,et al. Development and evaluation of a prosodic database for Greek speech synthesis and research* , 2008, J. Quant. Linguistics.

[20] Stavroula-Evita Fotinea,et al. A Methodology for Creating a Segment Inventory for Greek Time Domain Speech Synthesis , 2005, Int. J. Speech Technol..

[21] Richard Shillcock,et al. Proceedings of EUROSPEECH-1991. , 1991 .