An Arabic Morphological Analyzer and Generator with Copious Features

We introduce CALIMA-Star, a very rich Arabic morphological analyzer and generator that provides functional and form-based morphological features as well as built-in tokenization, phonological representation, lexical rationality and much more. This tool includes a fast engine that can be easily integrated into other systems, as well as an easy-to-use API and a web interface. CALIMA-Star also supports morphological reinflection. We evaluate CALIMA-Star against four commonly used analyzers for Arabic in terms of speed and morphological content.

[1]  Ryan Cotterell,et al.  The SIGMORPHON 2016 Shared Task—Morphological Reinflection , 2016, SIGMORPHON.

[2]  Nizar Habash,et al.  Don’t Throw Those Morphological Analyzers Away Just Yet: Neural Morphological Disambiguation for Arabic , 2017, EMNLP.

[3]  Rémi Zajac,et al.  The Temple Translator's Workstation Project , 1996, TIPSTER.

[4]  Nizar Habash,et al.  A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality , 2011, ACL.

[5]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[6]  Alexander Erdmann,et al.  Unified Guidelines and Resources for Arabic Dialect Orthography , 2018, LREC.

[7]  Nizar Habash,et al.  Fast Yet Rich Morphological Analysis , 2011, FSMNLP.

[8]  Nizar Habash,et al.  A Morphological Analyzer for Gulf Arabic Verbs , 2017, WANLP@EACL.

[9]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[10]  A. BOUDLAL,et al.  A Morphosyntactic analysis system for Arabic texts , 2010 .

[11]  Nizar Habash,et al.  Morphological Analysis and Generation of Arabic Nouns: A Morphemic Functional Approach , 2010, LREC.

[12]  Nizar Habash,et al.  50th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Volume 2: Short Papers , 2012 .

[13]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[14]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[15]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[16]  Nizar Habash,et al.  MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects , 2006, ACL.

[17]  Nizar Habash,et al.  CATiB: The Columbia Arabic Treebank , 2009, ACL.

[18]  Nizar Habash,et al.  YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer , 2016, COLING.

[19]  Nizar Habash,et al.  CamelParser: A system for Arabic Syntactic Analysis and Morphological Disambiguation , 2016, COLING.

[20]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[21]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[22]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[23]  Mohamed Boudchiche,et al.  AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer , 2017, J. King Saud Univ. Comput. Inf. Sci..

[24]  Nizar Habash,et al.  LDC Arabic Treebanks and Associated Corpora: Data Divisions Manual , 2013, ArXiv.

[25]  Kenneth R. Beesley Arabic Finite-State Morphological Analysis and Generation , 1996, COLING.

[26]  Nadir Durrani,et al.  Farasa: A Fast and Furious Segmenter for Arabic , 2016, NAACL.

[27]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[28]  Nizar Habash,et al.  Improving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules , 2009, HLT-NAACL.

[29]  Ibrahim A. Al-Kharashi,et al.  Arabic morphological analysis techniques: A comprehensive survey , 2004, J. Assoc. Inf. Sci. Technol..

[30]  Seth Kulick,et al.  Parsing the Arabic Treebank: Analysis and Improvements , 2006 .

[31]  Otakar Smrz,et al.  ElixirFM – Implementation of Functional Arabic Morphology , 2007, SEMITIC@ACL.

[32]  Nizar Habash,et al.  Optimizing Tokenization Choice for Machine Translation across Multiple Target Languages , 2017, Prague Bull. Math. Linguistics.

[33]  Nizar Habash,et al.  Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features , 2013, CL.

[34]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[35]  Nizar Habash,et al.  Arabic Morphological Representations for Machine Translation , 2007 .