Construction of an ontology for intelligent Arabic QA systems leveraging the Conceptual Graphs representation

The last decade had known a great interest in Arabic Natural Language Processing (NLP) applications. This interest is due to the prominent importance of this 6th most wide-spread language in the world with more than 350 million native speakers. Currently, some basic Arabic language challenges related to the high inflection and derivation, Part-of-Speech (PoS) tagging, and diacritical ambiguity of Arabic text are practically tamed to a great extent. However, the development of high level and intelligent applications such as Question Answering (QA) systems is still obstructed by the lacks in terms of ontologies and other semantic resources. In this paper, we present the construction of a new Arabic ontology leveraging the contents of Arabic WordNet (AWN) and Arabic VerbNet (AVN). This new resource presents the advantage to combine the high lexical coverage and semantic relations between words existing in AWN together with the formal representation of syntactic and semantic frames corresponding to verbs in AVN. The Conceptual Graphs representation was adopted in the framework of a multi-layer platform dedicated to the development of intelligent and multi-agents systems. The built ontology is used to represent key concepts in questions and documents for further semantic comparison. Experiments conducted in the context of the QA task show a promising coverage with respect to the processed questions and passages. The obtained results also highlight an improvement in the performance of Arabic QA regarding the [email protected] measure.

[1]  Paolo Rosso,et al.  An evaluated semantic query expansion and structure-based approach for enhancing Arabic question/answering , 2010 .

[2]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[3]  Paolo Rosso,et al.  IDRAAQ: New Arabic Question Answering System Based on Query Expansion and Passage Retrieval , 2012, CLEF.

[4]  Paolo Rosso,et al.  Structure-Based Evaluation of an Arabic Semantic Query Expansion Using the JIRS Passage Retrieval System , 2009, SEMITIC@EACL.

[5]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[6]  Ricardo Baeza-Yates,et al.  Flexible comparison of conceptual graphs , 2001 .

[7]  Yassine Benajiba,et al.  Question Answering , 2014, NLP of Semitic Languages.

[8]  Michael J. Witbrock,et al.  An Introduction to the Syntax and Content of Cyc , 2006, AAAI Spring Symposium: Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering.

[9]  Paolo Rosso,et al.  Three-level approach for Passage Retrieval in Arabic Question/Answering Systems , 2009 .

[10]  Adil Kabbaj Development of Intelligent Systems and Multi-Agents Systems with Amine Platform , 2006, ICCS.

[11]  Andreas Abecker,et al.  Ontologies and the Semantic Web , 2011, Handbook of Semantic Web Technologies.

[12]  Sherif Abdou,et al.  A Stochastic Arabic Diacritizer Based on a Hybrid of Factorized and Unfactorized Textual Features , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Mohsen Rashwan,et al.  Fassieh¯, a Semi-Automatic Visual Interactive Tool for Morphological, PoS-Tags, Phonetic, and Semantic Annotation of Arabic Text Corpora , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Jaouad Mousser A Large Coverage Verb Lexicon For Arabic , 2013 .

[15]  Sharon C. Salveter Review of Conceptual structures: information processing in mind and machine by John F. Sowa. Addison-Wesley 1984. , 1986 .

[16]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[17]  Adam Pease,et al.  Linking Lixicons and Ontologies: Mapping WordNet to the Suggested Upper Merged Ontology , 2003, IKE.

[18]  Mona T. Diab,et al.  Second Generation AMIRA Tools for Arabic Processing : Fast and Robust Tokenization , POS tagging , and Base Phrase Chunking , 2009 .

[19]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[21]  Mark Steedman,et al.  Temporal Ontology and Temporal Reference , 1988, CL.

[22]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[23]  Fabio Massimo Zanzotto,et al.  Mixing WordNet, VerbNet and PropBank for studying verb relations , 2006, LREC.

[24]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[25]  Dirk Noël Beth Levin. English Verb Classes and Alternations: A Preliminary Investigation , 1995 .

[26]  Paolo Rosso,et al.  Using the Yago ontology as a resource for the enrichment of Named Entities in Arabic WordNet , 2010 .

[27]  Paolo Rosso,et al.  Erratum to: On the evaluation and improvement of Arabic WordNet coverage and usability , 2013, Lang. Resour. Evaluation.

[28]  Paolo Rosso,et al.  On the evaluation and improvement of Arabic WordNet coverage and usability , 2013, Language Resources and Evaluation.

[29]  John Dunnion,et al.  Automatically building conceptual graphs using VerbNet and WordNet , 2004, ISICT.

[30]  Seth Kulick,et al.  Proposition Bank II: Delving Deeper , 2004, FCP@NAACL-HLT.

[31]  Jaouad Mousser,et al.  Classifying Arabic Verbs Using Sibling Classes , 2011, IWCS.

[32]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[33]  Alexander F. Gelbukh,et al.  Flexible Comparison of Conceptual GraphsWork done under partial support of CONACyT, CGEPI-IPN, and SNI, Mexico , 2001, DEXA.

[34]  Mohamed Shaheen,et al.  Arabic Question Answering: Systems, Resources, Tools, and Future Trends , 2014, Arabian Journal for Science and Engineering.

[35]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .