Automatic Speech-to-Text Transcription in Arabic

The Arabic language presents a number of challenges for speech recognition, arising in part from the significant differences in the spoken and written forms, in particular the conventional form of texts being non-vowelized. Being a highly inflected language, the Arabic language has a very large lexical variety and typically with several possible (generally semantically linked) vowelizations for each written form. This article summarizes research carried out over the last few years on speech-to-text transcription of broadcast data in Arabic. The initial research was oriented toward processing of broadcast news data in Modern Standard Arabic, and has since been extended to address a larger variety of broadcast data, which as a consequence results in the need to also be able to handle dialectal speech. While standard techniques in speech recognition have been shown to apply well to the Arabic language, taking into account language specificities help to significantly improve system performance.

[1]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[2]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Wu Chou,et al.  Pattern Recognition in Speech and Language Processing , 2002 .

[4]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[5]  Jong Kyoung Kim,et al.  Speech recognition , 1983, 1983 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[6]  J. Xu,et al.  Audio Indexing of Arabic broadcast news , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[8]  Steve Young,et al.  Corpus-based methods in language and speech processing , 1997 .

[9]  Wayne H. Ward,et al.  Speech recognition , 1997 .

[10]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[11]  Jean-Luc Gauvain,et al.  Partitioning and transcription of broadcast news data , 1998, ICSLP.

[12]  Chafic Mokbel,et al.  On the use of morphological constraints in n-gram statistical language model , 2005, INTERSPEECH.

[13]  Philip C. Woodland,et al.  Particle-based language modelling , 2000, INTERSPEECH.

[14]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[15]  Jean-Luc Gauvain,et al.  Speech Processing for Audio Indexing , 2008, GoTAL.

[16]  GauvainJean-Luc,et al.  Automatic Speech-to-Text Transcription in Arabic , 2009 .

[17]  Tanja Schultz,et al.  Turkish LVCSR: towards better speech recognition for agglutinative languages , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Andreas Stolcke,et al.  Recent innovations in speech-to-text transcription at SRI-ICSI-UW , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Bing Xiang,et al.  Morphological Decomposition for Arabic Broadcast News Transcription , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  Martine Adda-Decker A corpus-based decompounding algorithm for German lexical modeling in LVCSR , 2003, INTERSPEECH.

[21]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .

[22]  Sherif Abdou,et al.  Recent progress in Arabic broadcast news transcription at BBN , 2005, INTERSPEECH.

[23]  Jeff A. Bilmes,et al.  Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[24]  Lori Lamel,et al.  The Use of Lexica in Automatic Speech Recognition , 2000 .

[25]  Jean-Luc Gauvain,et al.  Transcribing broadcast data using MLP features , 2008, INTERSPEECH.

[26]  Zellig S. Harris,et al.  From Phoneme to Morpheme , 1955 .

[27]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[28]  Jean-Luc Gauvain,et al.  Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[29]  Jean-Luc Gauvain,et al.  Arabic Broadcast News Transcription Using a One Million Word Vocalized Vocabulary , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[30]  L. Lamel,et al.  Large-vocabulary continuous speech recognition: advances and applications , 2000, Proceedings of the IEEE.

[31]  Jean-Luc Gauvain,et al.  Transcription of arabic broadcast news , 2004, INTERSPEECH.

[32]  Frantisek Grézl,et al.  Optimizing bottle-neck features for lvcsr , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Z. Harris From Phoneme to Morpheme , 1955 .

[34]  Andreas Stolcke,et al.  Using MLP features in SRI's conversational speech recognition system , 2005, INTERSPEECH.

[35]  Lori Lamel,et al.  Text normalization and speech recognition in French , 1997, EUROSPEECH.

[36]  Jean-Luc Gauvain,et al.  On the Use of MLP Features for Broadcast News Transcription , 2008, TSD.

[37]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[38]  Jean-Luc Gauvain,et al.  Modeling vowels for Arabic BN transcription , 2005, INTERSPEECH.

[39]  Andreas Stolcke,et al.  Morphology-based language modeling for arabic speech recognition , 2004, INTERSPEECH.

[40]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..