Automatic Transcription of Polish Radio and Television Broadcast Audio

This paper describes a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for the transcription of television and radio broadcast audio in Polish. This work is one of the first attempts of speech recognition of broadcast audio in Polish. The system uses a hybrid, connectionist recognizer based on a recurrent neural network architecture. The training is based on an extensive set of manually transcribed and verified recordings of television and radio shows. This is further boosted by a large collection of textual data available from online sources, mostly up-to-date news articles. The paper describes and evaluates some of the key components of the architecture. The system is also compared to a conventional HMM-based architecture. An application of the described system in indexing and search of terms within audio and video transcripts is also described.

[1]  Jonathan G. Fiscus,et al.  1998 Broadcast News Benchmark Test Results: English and Non-English Word Error Rate Performance Measures , 1998 .

[2]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[3]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[4]  Isabel Trancoso,et al.  The L2F Broadcast News Speech Recognition System , 2010 .

[5]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[6]  Ryszard Gubrynowicz,et al.  User-Centered Design for a Voice Portal , 2009, Aspects of Natural Language Processing.

[7]  Hermann Ney,et al.  Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system , 2009, INTERSPEECH.

[8]  Björn W. Schuller,et al.  Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework , 2010, Cognitive Computation.

[9]  Małgorzata Marciniak,et al.  Aspects of Natural Language Processing , 2009, Lecture Notes in Computer Science.

[10]  Salim Roukos,et al.  Audio-Indexing For Broadcast News , 1998, TREC.

[11]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[12]  Ryszard Gubrynowicz,et al.  Multi-level Annotation in SpeeCon Polish Speech Database , 2004, IMTCI.

[13]  K. Marasek Large vocabulary continuous speech recognition system for Polish , 2003 .

[14]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[15]  Dr. Zbigniew Michalewicz,et al.  How to Solve It: Modern Heuristics , 2004 .

[16]  Thomas Kemp,et al.  Modelling unknown words in spontaneous speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  Danijel Koržinek,et al.  Grammar Based Automatic Speech Recognition System for the Polish Language , 2007 .

[18]  Ryszard Gubrynowicz,et al.  Design and Data Collection for Spoken Polish Dialogs Database , 2008, LREC.

[19]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[20]  Ciro Martins,et al.  Broadcast news subtitling system in Portuguese , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Krzysztof Marasek,et al.  Voice Portal for Public City Transportation , 2009 .

[22]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[23]  Zbigniew Michalewicz,et al.  Intelligent Media Technology for Communicative Intelligence , 2008 .