Investigating morphological decomposition for transcription of Arabic broadcast news and broadcast conversation data

One of the challenges of Arabic speech recognition is to deal with the huge lexical variety. Morphological decomposition has been proposed to address this problem by increasing lexical coverage, thereby reducing errors that are due to words that are unknown to the system. In our previous attempts to develop an Arabic speech-to-text (STT) transcription system with morphological decomposition, an increase in word error rate of about 2% absolute was observed relative to a comparable word based system. Based on an error analysis and a comparison of our approach with that of other sites, two modifications were made. The first modification was to not decompose the most frequent words; and the second to not decompose the prefix ’Al’ for words starting with a solar consonant since due to assimilation with the following consonant, deletion of the prefix was one of the most frequent errors. Comparable recognition performance was achieved using word-based and morphologically decomposed language models, and since the errors made by the systems are different, combining the two gave a performance gain.

[1]  Philip C. Woodland,et al.  Particle-based language modelling , 2000, INTERSPEECH.

[2]  Tanja Schultz,et al.  Turkish LVCSR: towards better speech recognition for agglutinative languages , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Zellig S. Harris,et al.  From Phoneme to Morpheme , 1955 .

[4]  Thomas Pellegrini,et al.  Using phonetic features in unsupervised word decompounding for ASR with application to a less-represented language , 2007, INTERSPEECH.

[5]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[6]  Jean-Luc Gauvain,et al.  Arabic Broadcast News Transcription Using a One Million Word Vocalized Vocabulary , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Lori Lamel,et al.  The Use of Lexica in Automatic Speech Recognition , 2000 .

[8]  Chafic Mokbel,et al.  On the use of morphological constraints in n-gram statistical language model , 2005, INTERSPEECH.

[9]  Ebru Arisoy,et al.  Language modeling for automatic turkish broadcast news transcription , 2007, INTERSPEECH.

[10]  Martine Adda-Decker A corpus-based decompounding algorithm for German lexical modeling in LVCSR , 2003, INTERSPEECH.

[11]  Jean-Luc Gauvain,et al.  Improved acoustic modeling for transcribing Arabic broadcast data , 2007, INTERSPEECH.

[12]  Bing Xiang,et al.  Morphological Decomposition for Arabic Broadcast News Transcription , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Ebru Arisoy,et al.  Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages , 2007, HLT-NAACL.

[14]  Andreas Stolcke,et al.  Morphology-based language modeling for arabic speech recognition , 2004, INTERSPEECH.

[15]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[16]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[17]  Lori Lamel,et al.  Text normalization and speech recognition in French , 1997, EUROSPEECH.

[18]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.