Building a morpho-semantic knowledge graph for Arabic information retrieval

Abstract In this paper, we propose to build a morpho-semantic knowledge graph from Arabic vocalized corpora. Our work focuses on classical Arabic as it has not been deeply investigated in related works. We use a tool suite which allows analyzing and disambiguating Arabic texts, taking into account short diacritics to reduce ambiguities. At the morphological level, we combine Ghwanmeh stemmer and MADAMIRA which are adapted to extract a multi-level lexicon from Arabic vocalized corpora. At the semantic level, we infer semantic dependencies between tokens by exploiting contextual knowledge extracted by a concordancer. Both morphological and semantic links are represented through compressed graphs, which are accessed through lazy methods. These graphs are mined using a measure inspired from BM25 to compute one-to-many similarity. Indeed, we propose to evaluate the morpho-semantic Knowledge Graph in the context of Arabic Information Retrieval (IR). Several scenarios of document indexing and query expansion are assessed. That is, we vary indexing units for Arabic IR based on different levels of morphological knowledge, a challenging issue which is not yet resolved in previous research. We also experiment several combinations of morpho-semantic query expansion. This permits to validate our resource and to study its impact on IR based on state-of-the art evaluation metrics.

[1]  Paolo Rosso,et al.  An evaluated semantic query expansion and structure-based approach for enhancing Arabic question/answering , 2010 .

[2]  Said Ouatik El Alaoui,et al.  Semantically enhanced term frequency based on word embeddings for Arabic information retrieval , 2016, 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt).

[3]  Mohamed Boudchiche,et al.  AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer , 2017, J. King Saud Univ. Comput. Inf. Sci..

[4]  John Grundy,et al.  Interactive Visualization Tools for Exploring the Semantic Graph of Large Knowledge Spaces , 2009 .

[5]  Alexander M. Fraser,et al.  TREC 2001 Cross-lingual Retrieval at BBN , 2001, TREC.

[6]  Narjès Bellamine Ben Saoud,et al.  Towards a New Standard Arabic Test Collection for Mono- and Cross-Language Information Retrieval , 2014, NLDB.

[7]  Sameh H. Ghwanmeh,et al.  Enhanced Algorithm for Extracting the Root of Arabic Words , 2009, 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization.

[8]  Ibrahim Bounhas,et al.  Toward an Arabic Ontology for Arabic Word Sense Disambiguation Based on Normalized Dictionaries , 2014, OTM Workshops.

[9]  Wajdi Zaghouani Critical Survey of the Freely Available Arabic Corpora , 2017, ArXiv.

[10]  Christiane Fellbaum,et al.  Building a WordNet for Arabic , 2006, LREC.

[11]  Jimmy J. Lin,et al.  Pairwise Document Similarity in Large Collections with MapReduce , 2008, ACL.

[12]  Masnizah Mohd,et al.  Semantically enhanced pseudo relevance feedback for Arabic information retrieval , 2016, J. Inf. Sci..

[13]  Kareem Darwish,et al.  Arabic Retrieval Revisited: Morphological Hole Filling , 2012, ACL.

[14]  Douglas W. Oard,et al.  Term selection for searching printed Arabic , 2002, SIGIR '02.

[15]  Khaled Shaalan,et al.  Semantic Search for Arabic , 2015, The Florida AI Research Society.

[16]  Lisa Ballesteros,et al.  Light Stemming for Arabic Information Retrieval , 2007 .

[17]  Amar Balla,et al.  Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems , 2017, Data in brief.

[18]  Abdelmajid Ben Hamadou,et al.  Generating core domain ontologies from normalized dictionaries , 2016, Eng. Appl. Artif. Intell..

[19]  Hany M. Harb,et al.  Azhary: An Arabic Lexical Ontology , 2014, ArXiv.

[20]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[21]  Ibrahim Bounhas,et al.  Arabic Cross-Language Information Retrieval , 2016, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[22]  Mauro Dragoni,et al.  Boosting Document Retrieval with Knowledge Extraction and Linked Data , 2019, Semantic Web.

[23]  Fatiha Sadat,et al.  Hybrid Arabic-French machine translation using syntactic re-ordering and morphological pre-processing , 2015, Comput. Speech Lang..

[24]  Kareem Darwish,et al.  Stemming techniques of Arabic Language: Comparative Study from the Information Retrieval Perspective , 2009 .

[25]  Eric Atwell,et al.  aConCorde: Towards an open-source, extendable concordancer for Arabic , 2006 .

[26]  Noha S. Fareed,et al.  Enhanced semantic arabic Question Answering system based on Khoja stemmer and AWN , 2013, 2013 9th International Computer Engineering Conference (ICENCO).

[27]  Ibrahim Bounhas,et al.  Combining Indexing Units for Arabic Information Retrieval , 2016, Int. J. Softw. Innov..

[28]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[29]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[30]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[31]  Narjès Bellamine Ben Saoud,et al.  A hybrid possibilistic approach for Arabic full morphological disambiguation , 2015, Data Knowl. Eng..

[32]  Mohammed El Amine Abderrahim Utilisation Des Ressources Externes Pour la Reformulation des Requêtes Dans un Système de Recherche D’Information , 2013, Prague Bull. Math. Linguistics.

[33]  Narjès Bellamine Ben Saoud,et al.  A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation , 2014, Knowledge and Information Systems.

[34]  Nagwa M. El-Makky,et al.  Al-Bayan: A Knowledge-based System for Arabic Answer Selection , 2015, SemEval@NAACL-HLT.

[35]  Samir Elmougy,et al.  Naïve Bayes Classifier for Arabic Word Sense Disambiguation , 2008 .

[36]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[37]  Ahmed Abdelali,et al.  Using Stem-Templates to Improve Arabic POS and Gender/Number Tagging , 2014, LREC.

[38]  Zhiyuan Liu,et al.  Representation Learning of Knowledge Graphs with Hierarchical Types , 2016, IJCAI.

[39]  Ossama Emam,et al.  Examining the Effect of Improved Context Sensitive Morphology on Arabic Information Retrieval , 2005, SEMITIC@ACL.

[40]  Bettina Klimek Proposing an OntoLex-MMoOn Alignment: Towards an Interconnection of two Linguistic Domain Models , 2017, LDK Workshops.

[41]  Mohamed Shaheen,et al.  Arabic Question Answering: Systems, Resources, Tools, and Future Trends , 2014, Arabian Journal for Science and Engineering.

[42]  Joseph Dichy,et al.  Assessing Word-form based Search for Information in Arabic: Towards a New Type of Lexical Resource , 2009 .

[43]  Tony P. Pridmore,et al.  Building a multi-modal Arabic corpus (MMAC) , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[44]  Kareem Darwish,et al.  Farasa: A New Fast and Accurate Arabic Word Segmenter , 2016, LREC.

[45]  Qasem A. Al-Radaideh,et al.  Benchmarking and assessing the performance of Arabic stemmers , 2011, J. Inf. Sci..

[46]  Ophir Frieder,et al.  On arabic search: improving the retrieval effectiveness via a light stemming approach , 2002, CIKM '02.

[47]  Martin Brümmer,et al.  Semantic Quran , 2015, Semantic Web.

[48]  Mounir Zrigui,et al.  Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation , 2011, Artificial Intelligence Review.

[49]  Hadhemi Achour,et al.  Multilingual learning objects indexing and retrieving based on ontologies , 2013, 2013 World Congress on Computer and Information Technology (WCCIT).

[50]  Lisa Ballesteros,et al.  Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[51]  Chaomei Chen,et al.  Visualising Semantic Spaces and Author Co-Citation Networks in Digital Libraries , 1999, Inf. Process. Manag..

[52]  Mark J. F. Gales,et al.  Morphological decomposition in Arabic ASR systems , 2012, Comput. Speech Lang..

[53]  Ibrahim Bounhas,et al.  A hybrid approach for standardized Dictionary-based knowledge extraction for Arabic morpho-semantic retrieval , 2018, 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR).

[54]  Lei Zou,et al.  Efficiently Answering Technical Questions - A Knowledge Graph Approach , 2017, AAAI.

[55]  Fredric C. Gey,et al.  Building an Arabic Stemmer for Information Retrieval , 2002, TREC.

[56]  Azzam Sleit,et al.  Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus , 2008 .

[57]  Majid A. Al-Taee,et al.  Automatic diacritization of Arabic text using recurrent neural networks , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[58]  Narjès Bellamine Ben Saoud,et al.  Combining Semantic Query Disambiguation and Expansion to Improve Intelligent Information Retrieval , 2014, ICAART.

[59]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[60]  Douglas W. Oard,et al.  Adapting Morphology for Arabic Information Retrieval , 2007 .

[61]  Fadi A. Zaraket,et al.  Arabic Morphological Analyzer with Agglutinative Affix Morphemes and Fusional Concatenation Rules , 2012, COLING.

[62]  Nadir Durrani,et al.  Farasa: A Fast and Furious Segmenter for Arabic , 2016, NAACL.

[63]  Amine Chikh,et al.  Semantic indexing of Arabic texts for information retrieval system , 2016, Int. J. Speech Technol..

[64]  Ibrahim Bounhas,et al.  ArabOnto: experimenting a new distributional approach for building Arabic ontological resources , 2011, Int. J. Metadata Semant. Ontologies.

[65]  Felix Hieber,et al.  Translation-based ranking in cross-language information retrieval , 2014 .

[66]  Leah S. Larkey,et al.  Arabic Information Retrieval at UMass in TREC-10 , 2001, TREC.

[67]  Fadi A. Zaraket,et al.  Arabic Entity Graph Extraction Using Morphology, Finite State Machines, and Graph Transformations , 2012, CICLing.

[68]  Ossama Emam,et al.  Language Model Based Arabic Word Segmentation , 2003, ACL.

[69]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[70]  Loïc Maisonnasse,et al.  Vers une approche statistique pour l'indexation sémantique des documents multilingues , 2010, INFORSID.

[71]  Lamia Hadrich Belguith,et al.  PIRAT: A Personalized Information Retrieval System in Arabic Texts Based on a Hybrid Representation of a User Profile , 2016, NLDB.

[72]  Narjès Bellamine Ben Saoud,et al.  Experimenting a discriminative possibilistic classifier with reweighting model for Arabic morphological disambiguation , 2015, Comput. Speech Lang..

[73]  M. Hadni,et al.  A new and efficient stemming technique for Arabic Text Categorization , 2012, 2012 International Conference on Multimedia Computing and Systems.

[74]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[75]  Fadi A. Zaraket,et al.  Arabic Cross-Document NLP for the Hadith and Biography Literature , 2012, FLAIRS Conference.

[76]  Khaled Shaalan,et al.  Conceptual Search for Arabic Web Content , 2015, CICLing.

[77]  Mohsen Rashwan,et al.  Semantic Query Expansion for Arabic Information Retrieval , 2014, ANLP@EMNLP.

[78]  Ghalem Belalem,et al.  Arabic Query Expansion Using WordNet and Association Rules , 2016, Int. J. Intell. Inf. Technol..

[79]  M Alguliyev Rasim,et al.  A NEW SIMILARITY MEASURE AND MATHEMATICAL MODEL FOR TEXT SUMMARIZATION (eng.) , 2015 .

[80]  Yan Zhang,et al.  Tailor knowledge graph for query understanding: linking intent topics by propagation , 2014, EMNLP.

[81]  Farid Meziane,et al.  DEAR-ONTO: A DErivational ARabic Ontology Based on Verbs , 2008, Int. J. Comput. Process. Orient. Lang..

[82]  Mauro Dragoni,et al.  Knowledge Extraction for Information Retrieval , 2016, ESWC.

[83]  Amine Chikh,et al.  Using Arabic Wordnet for semantic indexation in information retrieval system , 2013, ArXiv.