论文信息 - Post-processing of the recognized speech for web presentation of large audio archive

Post-processing of the recognized speech for web presentation of large audio archive

This paper deals with a post-processing phase of automatic transcription of spoken documents stored in the large Czech Radio audio archive (containing hundreds of thousands of recordings). The ultimate goal of the project is to transcribe them and to allow public access to their content. In this paper we focus on methods and algorithms for unsupervised post-processing of automatically recognized recordings. The post-processing is adapted for the needs of the web presentation of the archive. Up to now it has been used to process about 60,000 audio documents. We present the overall structure of the system as well as its core modules - speech recognition engine, speaker diarization module and final text processing. Special attention is paid to the punctuation issue. The punctuation accuracy is evaluated and compared to human use. In the final part of the paper we propose further improvements and ideas for the future research.

Michaela Kucharová | Karel Blavka | Marek Bohac | Svatava Skodová

[1] Jan Silovský,et al. Speaker diarization of broadcast streams using two-stage clustering based on i-vectors and cosine distance scoring , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Yang Liu,et al. Comparing and Combining Modeling Techniques for Sentence Segmentation of Spoken Czech Using Textual and Prosodic Information , 2010 .

[3] Yoshihiko Hayashi,et al. Speech-based and video-supported indexing of multimedia broadcast news , 2003, SIGIR '03.

[4] Jan Silovský,et al. Voice Technology to Enable Sophisticated Access to Historical Audio Archive of the Czech Radio , 2011, MM4CH.

[5] Roeland Ordelman,et al. Exploration of audiovisual heritage using audio indexing technology , 2006 .

[6] Jan Nouza,et al. A System for Information Retrieval from Large Records of Czech Spoken Data , 2006, TSD.

[7] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[8] Piret Laas. Preserving the National Heritage: Audiovisual Collections in Iceland , 2011 .

[9] Jan Silovský,et al. Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak) , 2009, COST 2102 Training School.