Using WikiData as a Multi-lingual Multi-dialectal Dictionary for Arabic Dialects

Since 2012, Wikidata has been developed as a freely accessible and community generated knowledge database that represents not only the name of each item (tree, famous person...) in all available languages but also to define links (like "component of", "born in", "instance of"...) between items. Nowadays, the output of Wikidata has become one of the greatest semantic web data in the world and has been proved to be useful to solve many currently existing problems in Computational Linguistics, in Medicine, and in many other fields. In this research work, we propose to convert Wikidata into a multilingual multi-dialectal dictionary for Arabic dialects and we explain how Wikidata as a multi-lingual multi-dialectal dictionary for Arabic dialects can be later used for the natural language processing of the varieties of Arabic by computational linguists and computer scientists.

[1]  Nizar Habash,et al.  A Large Scale Corpus of Gulf Arabic , 2016, LREC.

[2]  Lucia Specia,et al.  Personalized Machine Translation: Preserving Original Author Traits , 2016, EACL.

[3]  M. Maamouri The Georgetown dictionary of Iraqi Arabic : Arabic-English, English-Arabic , 2013 .

[4]  Eszter Simon,et al.  Automatic creation of bilingual dictionaries for Finno-Ugric languages , 2015 .

[5]  Michael Günther,et al.  Introducing Wikidata to the Linked Data Web , 2014, SEMWEB.

[6]  Christopher D. Manning,et al.  Advances in natural language processing , 2015, Science.

[7]  Aaron M Thompson Using part-of-speech tags to identify presence of location information in social media messages , 2016 .

[8]  Nizar Habash,et al.  Building a Corpus for Palestinian Arabic: a Preliminary Study , 2014, ANLP@EMNLP.

[9]  Pierre Nugues,et al.  Langforia: Language Pipelines for Annotating Large Collections of Documents , 2016, COLING.

[10]  T. Ajmi,et al.  Les attitudes professionnelles humaines des médecins de la région sanitaire de Sousse (Tunisie) , 2002 .

[11]  J. Owens The Oxford Handbook of Arabic Linguistics , 2019 .

[12]  Karim Bouzoubaa,et al.  Bootstrapping a WordNet for an Arabic dialect from other WordNets and dictionary resources , 2013, 2013 ACS International Conference on Computer Systems and Applications (AICCSA).

[13]  Denny Vrandecic The Rise of Wikidata , 2013, IEEE Intelligent Systems.

[14]  Nizar Habash,et al.  Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon , 2014, LREC.

[15]  Muhammad Abdul-Mageed,et al.  SAMAR: Subjectivity and sentiment analysis for Arabic social media , 2014, Comput. Speech Lang..

[16]  Nizar Habash,et al.  Developing an Egyptian Arabic Treebank: Impact of Dialectal Morphology on Annotation and Tool Development , 2014, LREC.

[17]  Salem Ghazali,et al.  Speech Rhythm Variation in Arabic Dialects , 2002 .

[18]  Nizar Habash,et al.  Developing and Using a Pilot Dialectal Arabic Treebank , 2006, LREC.

[19]  Nizar Habash,et al.  Conventional Orthography for Dialectal Arabic , 2012, LREC.

[20]  Nizar Habash,et al.  Conventional Orthography for Dialectal Arabic (CODA): Principles and Guidelines -- Egyptian Arabic - Version 0.7 - March 2012 , 2014 .

[21]  Nizar Habash,et al.  A Conventional Orthography for Algerian Arabic , 2015, ANLP@ACL.

[22]  Elsaid M. Badawi,et al.  A dictionary of Egyptian Arabic : Arabic-English , 1986 .

[23]  Rim Faiz,et al.  Tunisian dialect Wordnet creation and enrichment using web resources and other Wordnets , 2014, ANLP@EMNLP.

[24]  Fatima Abdurahman Nureddeen Cross cultural pragmatics: Apology strategies in Sudanese Arabic , 2008 .

[25]  Houcemeddine Turki,et al.  A Conventional Orthography for Maghrebi Arabic , 2016, LREC 2016.

[26]  J. Kaye GOVERNMENT IN PHONOLOGY. The Case of Moroccan Arabic , 1987 .

[27]  Christopher Cieri,et al.  Dialectal Arabic Orthography-based Transcription and CTS Levantine Arabic Collection , 2004, COLING 2004.

[28]  The pragmatics of ’inšāllah in Jordanian Arabic , 1995 .

[29]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[30]  Nizar Habash,et al.  A Conventional Orthography for Tunisian Arabic , 2014, LREC.

[31]  Chris Callison-Burch,et al.  Machine Translation of Arabic Dialects , 2012, NAACL.

[32]  Muhammad Abdul-Mageed,et al.  SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis , 2014, LREC.