Morphological and syntactic features for Arabic speech recognition

In this paper, we study the use of morphological and syntactic context features to improve speech recognition of a morphologically rich language like Arabic. We examine a variety of syntactic features, including part-of-speech tags, shallow parse tags, and exposed head words and their non-terminal labels both before and after the word to be predicted. Neural network LMs are used to model these features since they generalize better to unseen events by modeling words and other context features in continuous space. Using morphological and syntactic features, we can improve the word error rate (WER) significantly on various test sets, including EVAL'08U, the unsequestered portion of the DARPA GALE Phase 3 evaluation test set.

[1]  Brian Roark,et al.  Discriminative Syntactic Language Modeling for Speech Recognition , 2005, ACL.

[2]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[3]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[4]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[5]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[6]  Ahmad Emami,et al.  A Neural Syntactic Language Model , 2005, Machine Learning.

[7]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[8]  Brian Kingsbury,et al.  The IBM 2008 GALE Arabic speech transcription system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Ahmad Emami,et al.  Syntactic features for Arabic speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[10]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[11]  Brian Kingsbury,et al.  Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Ahmad Emami,et al.  Empirical study of neural network language models for Arabic speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  Daniel M. Bikel Statistical Parsing Exposed: Viewing the Model as Data , 2009 .

[15]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..