Automatic Transcription and Subtitling of Slovak Multi-genre Audiovisual Recordings

This paper summarizes a recent progress in the development of the automatic transcription system for subtitling of the Slovak multi-genre audiovisual recordings, such as lectures, talks, discussions, broadcast news or TV/radio shows. The main concept is based on application of current and innovative principles and methods oriented towards speech and language processing, automatic speech segmentation, speech recognition, statistical modeling and adaptation of acoustic and language models to a specific topic, gender and speaking style of the speaker. We have developed a working prototype of automatic transcription system for the Slovak language, mainly designed for subtitling of various types of single- or multi-channel audiovisual recordings. Preliminary results show a significant decrease in word error rate relatively from 2.40% to 47.10% for an individual speaker in fully automatic transcription and subtitling of Slovak parliament speech, broadcast news or TEDx talks.

[1]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[2]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[3]  Jozef Juhar,et al.  Hypothesis combination for Slovak dictation speech recognition , 2014, Proceedings ELMAR-2014.

[4]  Daniel Hladek,et al.  Semantically similar document retrieval framework for language model speaker adaptation , 2016, 2016 26th International Conference Radioelektronika (RADIOELEKTRONIKA).

[5]  György Szaszák,et al.  Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach , 2015, SPECOM.

[6]  Martin Lojka,et al.  Class-dependent two-dimensional linear discriminant analysis using two-pass recognition strategy , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[7]  Carlo Aliprandi,et al.  Automating live and batch subtitling of multimedia contents for several European languages , 2015, Multimedia Tools and Applications.

[8]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[9]  Martin Lojka,et al.  Query-by-example retrieval via fast sequential dynamic time warping algorithm , 2015, 2015 38th International Conference on Telecommunications and Signal Processing (TSP).

[10]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[11]  I. Jolliffe Principal Component Analysis , 2002 .

[12]  Matús Pleva,et al.  TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation , 2014, LREC.

[13]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[14]  Stanislav Ondas,et al.  Online natural language processing of the Slovak Language , 2014, 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom).

[15]  Mickael Rouvier,et al.  An open-source state-of-the-art toolbox for broadcast news diarization , 2013, INTERSPEECH.

[16]  Jozef Juhar,et al.  Comparison of Diarization Tools for Building Speaker Database , 2015 .

[17]  Martin Lojka,et al.  An Extension of the Slovak Broadcast News Corpus based on Semi-Automatic Annotation , 2016, LREC.

[18]  Tatsuya Kawahara,et al.  Automatic Transcription of Lecture Speech using Language Model Based on Speaking-Style Transformation of Proceeding Texts , 2012, INTERSPEECH.

[19]  Jozef Juhar,et al.  Interface for smart audiovisual data archive , 2015, 2015 25th International Conference Radioelektronika (RADIOELEKTRONIKA).

[20]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .