论文信息 - Semi-supervised Annotator of Speech Corpora and AGH Speech Corpus of Polish

Semi-supervised Annotator of Speech Corpora and AGH Speech Corpus of Polish

Software to generate professional speech corpora using audiobooks and corresponding text books is presented. The software allows the creation of speech corpora much faster and cheaper than traditional methods. Existing speech resources of Polish are described with a brief introduction to Polish dialects. An example of a small corpus of Polish made with the described tool is presented as well.

B. Ziółko | B. Miga | T. Jadczyk

[1] Richard Lippmann,et al. Neural Net and Traditional Classifiers , 1987, NIPS.

[2] Stanisław . Urbańczyk,et al. Encyklopedia języka polskiego , 1991 .

[3] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5] Walter Daelemans,et al. Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion , 1996 .

[6] S. Young. Large Vocabulary Continuous Speech Recognition : a ReviewSteve , 1996 .

[7] Simon King. Dependence and independence in automatic speech recognition and synthesis , 2003, J. Phonetics.

[8] Grazyna Demenko,et al. JURISDIC: Polish Speech Database for Taking Dictation of Legal Texts , 2008, LREC.

[9] Hermann Ney,et al. Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system , 2009, INTERSPEECH.

[10] Ngoc Thang Vu,et al. Multilingual a-stabil: A new confidence score for multilingual unsupervised training , 2010, 2010 IEEE Spoken Language Technology Workshop.

[11] Mariusz Ziólko,et al. Automatic Speech Recognition System Based on Wavelet Analysis , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.