Transcription System for Semi-Spontaneous Estonian Speech

This paper describes a speech-to-text system for semi-spontaneous Estonian speech. The system is trained on about 100 hours of manually transcribed speech and a 300Mword text corpus. Compound words are split before building the language model and reconstructed from recognizer output using a hidden event Ngram model. We use a three pass transcription strategy with unsupervised speaker adaptation between individual passes. The system achieves a word error rate of 34.6% on conference speeches and 25.6% on radio talk shows.

[1]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[2]  Einar Meister,et al.  National Programme for Estonian Language Technology: a Pre-final Summary , 2010, Baltic HLT.

[3]  L. C. Jain,et al.  Advances in Knowledge-Based and Intelligent Information and Engineering Systems - 16th Annual KES Conference, San Sebastian, Spain, 10-12 September 2012 , 2012, KES.

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Pärtel Lippus,et al.  The acoustic features and perception of the Estonian quantity system , 2011 .

[6]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[7]  Kadri Muischnek,et al.  THE CORPORA OF ESTONIAN AT THE UNIVERSITY OF TARTU : THE CURRENT SITUATION , 2010 .

[8]  Sylvain Meignier,et al.  LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .

[9]  Tanel Alumäe Automatic Compound Word Reconstruction for Speech Recognition of Compounding Languages , 2007, NODALIDA.

[10]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[12]  Mikko Kurimo,et al.  Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages , 2008 .

[13]  Georg Heigold,et al.  The RWTH aachen university open source speech recognition system , 2009, INTERSPEECH.

[14]  Tanel Alumäe,et al.  TSAB - Web Interface for Transcribed Speech Collections , 2011, INTERSPEECH.

[15]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..