Using Part of Speech N-Grams for Improving Automatic Speech Recognition of Polish

This paper investigates the usefulness of a part of speech language model on the task of automatic speech recognition. The develped model uses part of speech tags as categories in a category-based language model. The constructed model is used to re-score the hypotheses generated by the HTK acoustic module. The probability of a given sequence of words is estimated using n-grams with Witten-Bell backoff. The experiments presented in this paper were carried out for Polish. The best obtained results show that the part-of-speech-only language model trained on a 1-million manually tagged corpus reduces the word error rate by more than 10 percentage points.

[1]  Thomas Niesler,et al.  Comparison of part-of-speech and automatically derived category-based language models for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Dawid Skurzok,et al.  N-Grams Model for Polish , 2011 .

[3]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[4]  Murat Saraclar,et al.  Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[6]  Roman Grundkiewicz,et al.  Automatic Extraction of Polish Language Errors from Text Edition History , 2013, TSD.

[7]  Adam Radziszewski A Tiered CRF Tagger for Polish , 2013, Intelligent Tools for Building a Scientific Information Platform.

[8]  Maciej Piasecki Hand-Written and Automatically Extracted Rules for Polish Tagger , 2006, TSD.

[9]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[10]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Adam Radziszewski,et al.  A Memory-Based Tagger for Polish , 2011 .

[12]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[13]  Robert Bembenik,et al.  Intelligent Tools for Building a Scientific Information Platform , 2013, Intelligent Tools for Building a Scientific Information Platform.

[14]  Mikko Kurimo,et al.  Importance of High-Order N-Gram Models in Morph-Based Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Szymon Acedanski,et al.  A Morphosyntactic Brill Tagger for Inflectional Languages , 2010, IceTAL.

[16]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[17]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[18]  Steve Young,et al.  A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..

[19]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[20]  Stefan Grocholewski CORPORA - speech database for Polish diphones , 1997, EUROSPEECH.