Morphologically motivated word classes for very large vocabulary speech recognition of Finnish and Estonian

Abstract We study class-based n-gram and neural network language models for very large vocabulary speech recognition of two morphologically rich languages: Finnish and Estonian. Due to morphological processes such as derivation, inflection and compounding, the models need to be trained with vocabulary sizes of several millions of word types. Class-based language modelling is in this case a powerful approach to alleviate the data sparsity and reduce the computational load. For a very large vocabulary, bigram statistics may not be an optimal way to derive the classes. We thus study utilizing the output of a morphological analyzer to achieve efficient word classes. We show that efficient classes can be learned by refining the morphological classes to smaller equivalence classes using merging, splitting and exchange procedures with suitable constraints. This type of classification can improve the results, particularly when language model training data is not very large. We also extend the previous analyses by rescoring the hypotheses obtained from a very large vocabulary recognizer using class-based neural network language models. We show that despite the fixed vocabulary, carefully constructed classes for word-based language models can in some cases result in lower error rates than subword-based unlimited vocabulary language models.

[1]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[2]  Thomas Niesler,et al.  Comparison of part-of-speech and automatically derived category-based language models for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[4]  Mikko Kurimo,et al.  A word-level token-passing decoder for subword n-gram LVCSR , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[5]  Mikko Kurimo,et al.  Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Teemu Hirsimäki,et al.  On Growing and Pruning Kneser–Ney Smoothed $ N$-Gram Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[8]  Gailius Raskinis,et al.  Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition , 2004, Informatica.

[9]  Philip C. Woodland,et al.  Efficient class-based language modelling for very large vocabularies , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Hermann Ney,et al.  Look-ahead techniques for fast beam search , 2000, Comput. Speech Lang..

[11]  Marie-Francine Moens,et al.  A survey on the application of recurrent neural networks to statistical language modeling , 2015, Comput. Speech Lang..

[12]  Miloslav Konopík,et al.  Morphological based language models for inflectional languages , 2011, Proceedings of the 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems.

[13]  Mikko Kurimo,et al.  Learning a subword vocabulary based on unigram likelihood , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[14]  K. Koskenniemi,et al.  A PROCESS MODEL OF MORPHOLOGY AND LEXICON , 1985 .

[15]  Krzysztof Marasek,et al.  SPEECON – Speech Databases for Consumer Devices: Database Specification and Validation , 2002, LREC.

[16]  Mikko Kurimo,et al.  Class n-Gram Models for Very Large Vocabulary Speech Recognition of Finnish and Estonian , 2016, SLSP.

[17]  Hermann Ney,et al.  Forming Word Classes by Statistical Clustering for Statistical Language Modelling , 1993 .

[18]  Philip C. Woodland,et al.  Particle-based language modelling , 2000, INTERSPEECH.

[19]  Heinrich Niemann,et al.  Ergodic hidden Markov models and polygrams for language modeling , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[21]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[22]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[23]  Janne Pylkkönen AN EFFICIENT ONE-PASS DECODER FOR FINNISH LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , .

[24]  Mikko Kurimo,et al.  Importance of High-Order N-Gram Models in Morph-Based Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Dong Yu,et al.  Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[26]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[27]  Philip C. Woodland,et al.  Language modelling for Russian and English using words and classes , 2003, Comput. Speech Lang..

[28]  Ebru Arisoy,et al.  Morph-based speech recognition and modeling of out-of-vocabulary words across languages , 2007, TSLP.

[29]  Andreas Stolcke,et al.  SRILM at Sixteen: Update and Outlook , 2011 .

[30]  Hermann Ney,et al.  Progress in dynamic programming search for LVCSR , 2000 .

[31]  Eva Liina Asu,et al.  Estonian , 2009, Journal of the International Phonetic Association.

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[34]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[35]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[36]  Tibor Fegyó,et al.  A bilingual study on the prediction of morph-based improvement , 2014, SLTU.

[37]  Fred Karlsson Finnish: An Essential Grammar , 1999 .

[38]  Mikko Kurimo,et al.  Modeling under-resourced languages for speech recognition , 2017, Lang. Resour. Evaluation.

[39]  M. Kurimo,et al.  Decoder issues in unlimited finnish speech recognition , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[40]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[41]  Weiqiang Zhang,et al.  RNN language model with word clustering and class-based output layer , 2013, EURASIP J. Audio Speech Music. Process..

[42]  Mikko Kurimo,et al.  Unlimited vocabulary speech recognition with morph language models applied to Finnish , 2006, Comput. Speech Lang..

[43]  Tanja Schultz,et al.  Correlated Bigram LSA for Unsupervised Language Model Adaptation , 2008, NIPS.

[44]  Joshua Goodman,et al.  Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[45]  Tommi A. Pirinen,et al.  Omorfi — Free and open source morphological lexical database for Finnish , 2015, NODALIDA.

[46]  Xavier L. Aubert,et al.  An overview of decoding techniques for large vocabulary continuous speech recognition , 2002, Comput. Speech Lang..

[47]  Stanley F. Chen,et al.  Enhanced word classing for model M , 2010, INTERSPEECH.

[48]  Stanley F. Chen,et al.  Performance Prediction for Exponential Language Models , 2009, NAACL.

[49]  Hermann Ney,et al.  On efficient training of word classes and their application to recurrent neural network language models , 2015, INTERSPEECH.

[50]  Ebru Arisoy,et al.  Unlimited vocabulary speech recognition for agglutinative languages , 2006, NAACL.

[51]  Thomas Niesler,et al.  Variable-length categoryn-gram language models , 1999, Comput. Speech Lang..

[52]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[53]  Mikko Kurimo,et al.  TheanoLM - An Extensible Toolkit for Neural Network Language Modeling , 2016, INTERSPEECH.

[54]  Mikko Kurimo,et al.  First-Pass Techniques for Very Large Vocabulary Speech Recognition ff Morphologically Rich Languages , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[55]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[56]  George Saon,et al.  Dynamic network decoding revisited , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[57]  Krister Lindén Entry Generation for New Words by Analogy for Morphological Lexicons , 2009 .

[58]  Hermann Ney,et al.  From within-word model search to across-word model search in large vocabulary continuous speech recognition , 2002, Comput. Speech Lang..

[59]  Sven Laur,et al.  EstNLTK - NLP Toolkit for Estonian , 2016, LREC.

[60]  Mikko Kurimo,et al.  Improved Subword Modeling for WFST-Based Speech Recognition , 2017, INTERSPEECH.

[61]  Mikko Kurimo,et al.  Automatic Construction of the Finnish Parliament Speech Corpus , 2017, INTERSPEECH.

[62]  Ebru Arisoy,et al.  Turkish Broadcast News Transcription and Retrieval , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[63]  Mikko Kurimo,et al.  Analysing Recognition Errors in Unlimited-Vocabulary Speech Recognition , 2009, HLT-NAACL.

[64]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..