Broadcast news transcription in Central-East European languages

This paper addresses two main issues. First, how to develop broadcast news transcription systems for Central-East European languages in a short time if only restricted language-specific knowledge is available; and second how to improve an already existing system by using on-line learning method. Accordingly, we present recognition results of two newly developed news transcription systems for Polish and Romanian languages, which are trained in fully data-driven manner based on only a few hours of manual transcriptions and web materials. Besides, an automatic language model updating method is also presented for our Hungarian transcription system. Continuous updating of the language model resulted in 2% relative WER (Word Error Rate) reduction measured on a 3 month long period primarily due to better language model parameter matching for IV (Intra Vocabulary) words and secondary due the reduction of OOV (Out Of Vocabulary) words. To the best of our knowledge, the first Romanian broadcast news recognition results are published in this study.

[1]  Hermann Ney,et al.  Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system , 2009, INTERSPEECH.

[2]  Peter Baranyi,et al.  Cognitive infocommunications: CogInfoCom , 2010, 2010 11th International Symposium on Computational Intelligence and Informatics (CINTI).

[3]  Hermann Ney,et al.  Continuous speech dictation - From theory to practice , 1995, Speech Commun..

[4]  Ebru Arisoy,et al.  Turkish Broadcast News Transcription and Retrieval , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Hermann Ney,et al.  Sub-lexical language models for German LVCSR , 2010, 2010 IEEE Spoken Language Technology Workshop.

[6]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[7]  Ngoc Thang Vu,et al.  Speech recognition for machine translation in Quaero , 2011, IWSLT.

[8]  Laurent Mauuary,et al.  Blind equalization for robust telephone based speech recognition , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[9]  Hermann Ney,et al.  Using morpheme and syllable based sub-words for polish LVCSR , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Ebru Arisoy,et al.  Unlimited vocabulary speech recognition for agglutinative languages , 2006, NAACL.

[11]  Horia Cucu,et al.  ASR domain adaptation methods for low-resourced languages: Application to Romanian language , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[12]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[13]  Isabel Trancoso,et al.  The L2F Broadcast News Speech Recognition System , 2010 .

[14]  Tibor Fegyó,et al.  Improved Recognition of Spontaneous Hungarian Speech—Morphological and Acoustic Modeling Techniques for a Less Resourced Task , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[16]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[17]  László Tóth,et al.  Speech Recognition Experiments with Audiobooks , 2010, Acta Cybern..

[18]  Hermann Ney,et al.  Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  KurimoMikko,et al.  Morph-based speech recognition and modeling of out-of-vocabulary words across languages , 2007 .

[20]  I. Gavat,et al.  A Comparative Study of Feature Extraction Methods Applied to Continuous Speech Recognition in Romanian Language , 2006, Proceedings ELMAR 2006.

[21]  Tanja Schultz,et al.  Grapheme based speech recognition , 2003, INTERSPEECH.

[22]  Ovidiu Buza,et al.  Text conditioning and statistical language modeling for Romanian language , 2009, 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue.

[23]  Balazs Tarjan,et al.  Evaluation of lexical models for Hungarian Broadcast speech transcription and spoken term detection , 2011, 2011 2nd International Conference on Cognitive Infocommunications (CogInfoCom).

[24]  Ebru Arisoy,et al.  Morph-based speech recognition and modeling of out-of-vocabulary words across languages , 2007, TSLP.

[25]  Péter Mihajlik,et al.  On morph-based LVCSR improvements , 2010, SLTU.

[26]  Krzysztof Marasek Polish LVCSR in the Janus system. Preliminary results for the SpeeCon database , 2007 .

[27]  Inge Gavat,et al.  Progress in Speech Recognition for Romanian Language , 2008 .

[28]  G. Demenko,et al.  LVCSR Speech Database - JURISDIC , 2008, New Trends in Audio and Video / Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2008.

[29]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[30]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.