Mapping Arabic WordNet synsets to Wikipedia articles using monolingual and bilingual features

The alignment of WordNet and Wikipedia has received wide attention from researchers of computational linguistics, who are building a new lexical knowledge source or enriching the semantic information of WordNet entities. The main challenge of this alignment is how to handle the synonymy and ambiguity issues in the contents of two units from different sources. Therefore, this paper introduces mapping method that links an Arabic WordNet synset to its corresponding article in Wikipedia. This method uses monolingual and bilingual features to overcome the lack of semantic information in Arabic WordNet. For evaluating this method, an Arabic mapping data set, which contains 1,291 synset–article pairs, is compiled. The experimental analysis shows that the proposed method achieves promising results and outperforms the state-of-the-art methods that depend only on monolingual features. The mapped method has also been used to increase the coverage of Arabic WordNet by inserting new synsets from Wikipedia.

[1]  Horacio Rodríguez,et al.  O Automatically Extending Named Entities coverage of Arabic WordNet using Wikipedia , 2022 .

[2]  Iryna Gurevych,et al.  Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words , 2009, Natural Language Engineering.

[3]  Azuraliza Abu Bakar,et al.  Soft Computing Applications and Intelligent Systems , 2013, Communications in Computer and Information Science.

[4]  Peter Mark Roget,et al.  Roget's International Thesaurus , 1977 .

[5]  Amira D. Kashgary The paradox of translating the untranslatable: Equivalence vs. non-equivalence in translating from Arabic into English , 2011 .

[6]  Rada Mihalcea,et al.  Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge , 2009, EMNLP.

[7]  O. Smadi,et al.  Arabicization and Arabic Expanding Techniques Used in Science Lectures in Two Arab Universities , 2012 .

[8]  Mirella Lapata,et al.  Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 6-7 August 2009, Singapore, A meeting of SIGDAT, a Special Interest Group of the ACL , 2009, EMNLP.

[9]  Maria Ruiz-Casado,et al.  Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets , 2005, AWIC.

[10]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[11]  van Gerardus Noord,et al.  Special issue: finite state methods in natural language processing , 2003 .

[12]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[13]  Horacio Rodríguez,et al.  Combining Multiple Methods for the Automatic Construction of Multilingual WordNets , 1997, ArXiv.

[14]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[15]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[16]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[17]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[18]  Christiane Fellbaum,et al.  Building a WordNet for Arabic , 2006, LREC.

[19]  Sebastian Möller,et al.  Proceedings of the 6th International Conference on Language Resources and Evaluation , 2008 .

[20]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[21]  Iryna Gurevych,et al.  Dijkstra-WSA: A Graph-Based Approach to Word Sense Alignment , 2013, Transactions of the Association for Computational Linguistics.

[22]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[23]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[24]  Della Summers,et al.  Longman Dictionary of Contemporary English , 1995 .

[25]  Junzhong Gu,et al.  A New Model of Information Content for Semantic Similarity in WordNet , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.

[26]  Paolo Rosso,et al.  Using the Yago ontology as a resource for the enrichment of Named Entities in Arabic WordNet , 2010 .

[27]  Iryna Gurevych,et al.  The People’s Web meets Linguistic Knowledge: Automatic Sense Alignment of Wikipedia and WordNet , 2011, IWCS.

[28]  Claire Grover,et al.  In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC , 2006 .

[29]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[30]  Iryna Gurevych,et al.  Aligning Sense Inventories in Wikipedia and WordNet , 2010 .

[31]  Ziqi Zhang,et al.  Recent advances in methods of lexical semantic relatedness – a survey , 2012, Natural Language Engineering.

[32]  Mohd Juzaiddin Ab Aziz,et al.  The Enhancement of Arabic Stemming by Using Light Stemming and Dictionary-Based Stemming , 2011, J. Softw. Eng. Appl..

[33]  Roberto Navigli,et al.  A Robust Approach to Aligning Heterogeneous Lexical Resources , 2014, ACL.

[34]  Paul Procter,et al.  Longman Dictionary of Contemporary English , 1978 .

[35]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[36]  Gaël de Chalendar,et al.  WoNeF, an improved, expanded and evaluated automatic French translation of WordNet , 2014, GWC.

[37]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..

[38]  Mark Stevenson,et al.  Mapping WordNet synsets to Wikipedia articles , 2012, LREC.

[39]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[40]  Christiane Fellbaum,et al.  Arabic WordNet. Current State and Future Extensions , 2008 .

[41]  Jérôme Euzenat,et al.  A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness , 2010, SEMWEB.

[42]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[43]  Nazlia Omar,et al.  Measuring the Compositionality of Arabic Multiword Expressions , 2013, M-CAIT.

[44]  German Rigau,et al.  Book Reviews: EuroWordNet: A Multilingual Database with Lexical Semantic Networks , 1999, CL.

[45]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[46]  Antonio Toral,et al.  Named Entity WordNet , 2008, LREC.

[47]  Aitor Soroa,et al.  Cross-lingual event-mining using wordnet as a shared knowledge interface , 2012 .

[48]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[49]  Simone Paolo Ponzetto,et al.  Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems , 2010, ACL.

[50]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[51]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[52]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[53]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[54]  Janusz Kacprzyk,et al.  Advances in Web Intelligence , 2003, Lecture Notes in Computer Science.