论文信息 - Transcription System for Semi-Spontaneous Estonian Speech

Transcription System for Semi-Spontaneous Estonian Speech

This paper describes a speech-to-text system for semi-spontaneous Estonian speech. The system is trained on about 100 hours of manually transcribed speech and a 300Mword text corpus. Compound words are split before building the language model and reconstructed from recognizer output using a hidden event Ngram model. We use a three pass transcription strategy with unsupervised speaker adaptation between individual passes. The system achieves a word error rate of 34.6% on conference speeches and 25.6% on radio talk shows.

Tanel Alumäe

[1] François Yvon,et al. Practical Very Large Scale CRFs , 2010, ACL.

[2] Einar Meister,et al. National Programme for Estonian Language Technology: a Pre-final Summary , 2010, Baltic HLT.

[3] L. C. Jain,et al. Advances in Knowledge-Based and Intelligent Information and Engineering Systems - 16th Annual KES Conference, San Sebastian, Spain, 10-12 September 2012 , 2012, KES.

[4] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5] Pärtel Lippus,et al. The acoustic features and perception of the Estonian quantity system , 2011 .

[6] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[7] Kadri Muischnek,et al. THE CORPORA OF ESTONIAN AT THE UNIVERSITY OF TARTU : THE CURRENT SITUATION , 2010 .

[8] Sylvain Meignier,et al. LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .

[9] Tanel Alumäe. Automatic Compound Word Reconstruction for Speech Recognition of Compounding Languages , 2007, NODALIDA.

[10] Jean-Luc Gauvain,et al. Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[12] Mikko Kurimo,et al. Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages , 2008 .

[13] Georg Heigold,et al. The RWTH aachen university open source speech recognition system , 2009, INTERSPEECH.

[14] Tanel Alumäe,et al. TSAB - Web Interface for Transcribed Speech Collections , 2011, INTERSPEECH.

[15] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..