Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition

This paper introduces two complementary language modeling approaches for morphologically rich languages aiming to alleviate out-of-vocabulary (OOV) word problem and to exploit morphology as a knowledge source. The first model, morpholexical language model, is a generative $n$-gram model, where modeling units are lexical-grammatical morphemes instead of commonly used words or statistical sub-words. This paper also proposes a novel approach for integrating the morphology into an automatic speech recognition (ASR) system in the finite-state transducer framework as a knowledge source. We accomplish that by building a morpholexical search network obtained by the composition of lexical transducer of a computational lexicon with a morpholexical language model. The second model is a linear reranking model trained discriminatively with a variant of the perceptron algorithm using morpholexical features. This variant of the perceptron algorithm, WER-sensitive perceptron, is shown to perform better for reranking $n$ -best candidates obtained with the generative model. We apply the proposed models in Turkish broadcast news transcription task and give experimental results. The morpholexical model leads to an elegant morphology-integrated search network with unlimited vocabulary. Thus, it is highly effective in alleviating OOV problem and improves the word error rate (WER) over word and statistical sub-word models by 1.8% and 0.4% absolute, respectively. The discriminatively trained morpholexical model further improves the WER of the system by 0.8% absolute.

[1]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[2]  Ebru Arisoy,et al.  Lattice Extension and Vocabulary Adaptation for Turkish LVCSR , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Murat Saraclar,et al.  Integrating morphology into automatic speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[4]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[5]  Brian Roark,et al.  Generalized Algorithms for Constructing Statistical Language Models , 2003, ACL.

[6]  Brian Roark,et al.  The design principles and algorithms of a weighted grammar library , 2005, Int. J. Found. Comput. Sci..

[7]  Mikko Kurimo,et al.  Unlimited vocabulary speech recognition with morph language models applied to Finnish , 2006, Comput. Speech Lang..

[8]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[9]  Michael Collins,et al.  Trigger-Based Language Modeling using a Loss-Sensitive Perceptron Algorithm , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Petra Geutner,et al.  Using morphology towards better large-vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[12]  Brian Roark,et al.  A generalized construction of integrated speech recognition transducers , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Thorsten Brants,et al.  Study on interaction between entropy pruning and kneser-ney smoothing , 2010, INTERSPEECH.

[14]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[15]  Kemal Oflazer,et al.  Two-level Description of Turkish Morphology , 1993, EACL.

[16]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984 .

[17]  Tibor Fegyó,et al.  A morpho-graphemic approach for the recognition of spontaneous speech in agglutinative languages - like Hungarian , 2007, INTERSPEECH.

[18]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[19]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[20]  William J. Byrne,et al.  On large vocabulary continuous speech recognition of highly inflectional language - czech , 2001, INTERSPEECH.

[21]  Murat Saraclar,et al.  Morphology-based and sub-word language modeling for Turkish speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  K. Oflazer,et al.  Incorporating language constraints in sub-word based speech recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[23]  Lauri Karttunen,et al.  Two-level rule compiler , 1992 .

[24]  Oh-Wook Kwon,et al.  Korean large vocabulary continuous speech recognition with morpheme-based recognition units , 2003, Speech Commun..

[25]  Murat Saraclar,et al.  Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus , 2008, GoTAL.

[26]  Murat Saraclar,et al.  Discriminative reranking of ASR hypotheses with morpholexical and N-best-list features , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[27]  Murat Saraclar,et al.  Morphological Disambiguation of Turkish Text with Perceptron Algorithm , 2009, CICLing.

[28]  Kemal Oflazer,et al.  The architecture and the implementation of a finite state pronunciation lexicon for Turkish , 2006, Comput. Speech Lang..

[29]  Mehryar Mohri,et al.  Integrated context-dependent networks in very large vocabulary speech recognition , 1999, EUROSPEECH.

[30]  Murat Saraclar,et al.  Resources for Turkish morphological processing , 2011, Lang. Resour. Evaluation.

[31]  Ruhi Sarikaya,et al.  Joint Morphological-Lexical Language Modeling for Processing Morphologically Rich Languages With Application to Dialectal Arabic , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[33]  Ebru Arisoy,et al.  Discriminative Language Modeling With Linguistic and Statistically Derived Features , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Ebru Arisoy,et al.  Language modeling for automatic turkish broadcast news transcription , 2007, INTERSPEECH.

[35]  Murat Saraclar,et al.  On-the-fly lattice rescoring for real-time automatic speech recognition , 2010, INTERSPEECH.

[36]  Ebru Arisoy,et al.  Turkish Broadcast News Transcription and Retrieval , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[38]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[39]  Aravind K. Joshi,et al.  Ranking and Reranking with Perceptron , 2005, Machine Learning.

[40]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[41]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[42]  Mirjam Sepesy Maucec,et al.  Large vocabulary continuous speech recognition of an inflected language using stems and endings , 2007, Speech Commun..