Language Model Speaker Adaptation for Transcription of Slovak Parliament Proceedings

Language model and acoustic model adaptation play an important role in enhancing performance and robustness of automatic speech recognition, especially in the case of domain-specific, gender-dependent, or user-adapted systems development. This paper is oriented on the language model speaker adaptation for transcription of parliament proceedings in Slovak for individual speaker. Based on the current research studies, we have developed a framework combining multiple speech recognition outputs with acoustic and language model adaptation at different stages. The preliminary results show a significant decrease in the model perplexity from 45 % to 74 % relatively and the speech recognition word error rate from 29 % to 43 %, for male and female speakers respectively.

[1]  Yasuo Ariki,et al.  Live speech recognition in sports games by adaptation of acoustic model and language model , 2003, INTERSPEECH.

[2]  Brian Roark,et al.  Unsupervised language model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Tatsuya Kawahara,et al.  Language model and speaking rate adaptation for spontaneous presentation speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[4]  Jean-Luc Gauvain,et al.  Dynamic language modeling for broadcast news , 2004, INTERSPEECH.

[5]  Gökhan Tür,et al.  Exploiting user feedback for language model adaptation in meeting recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Dietrich Klakow,et al.  Language model adaptation for tiny adaptation corpora , 2006, INTERSPEECH.

[7]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[8]  Gökhan Tür,et al.  Unsupervised Languagemodel Adaptation for Meeting Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Jozef Juhar,et al.  Hypothesis combination for Slovak dictation speech recognition , 2014, Proceedings ELMAR-2014.

[10]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[11]  Jan Nouza,et al.  Improved Transcription of Czech Parliament Speeches by Acoustic and Language Model Adaptation , 2006 .

[12]  Thomas Niesler,et al.  Unsupervised language model adaptation for lecture speech transcription , 2002, INTERSPEECH.

[13]  Jozef Juhár,et al.  Classification of heterogeneous text data for robust domain-specific language modeling , 2014, EURASIP J. Audio Speech Music. Process..

[14]  James R. Glass,et al.  Language model parameter estimation using user transcriptions , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Dietrich Klakow,et al.  Language model adaptation using dynamic marginals , 1997, EUROSPEECH.

[16]  Stefan Besling,et al.  Language model speaker adaptation , 1995, EUROSPEECH.

[17]  Martin Lojka,et al.  Slovak Automatic Dictation System for Judicial Domain , 2011, LTC.