Effectiveness of Automatic Translations for Cross-Lingual Ontology Mapping

Accessing or integrating data lexicalized in different languages is a challenge. Multilingual lexical resources play a fundamental role in reducing the language barriers to map concepts lexicalized in different languages. In this paper we present a large-scale study on the effectiveness of automatic translations to support two key cross-lingual ontology mapping tasks: the retrieval of candidate matches and the selection of the correct matches for inclusion in the final alignment. We conduct our experiments using four different large gold standards, each one consisting of a pair of mapped wordnets, to cover four different families of languages. We categorize concepts based on their lexicalization (type of words, synonym richness, position in a subconcept graph) and analyze their distributions in the gold standards. Leveraging this categorization, we measure several aspects of translation effectiveness, such as word-translation correctness, word sense coverage, synset and synonym coverage. Finally, we thoroughly discuss several findings of our study, which we believe are helpful for the design of more sophisticated cross-lingual mapping algorithms.

[1]  Bohn Stafleu van Loghum Google translate , 2017 .

[2]  Valerie V. Cross,et al.  LogMap family results for OAEI 2014 , 2014, OM.

[3]  Declan O'Sullivan,et al.  A configurable translation-based cross-lingual ontology mapping system to adjust mapping outcomes , 2012, J. Web Semant..

[4]  Isabel F. Cruz,et al.  AgreementMakerLight results for OAEI 2013 , 2013, OM.

[5]  Mohammed N. Al-Kabi,et al.  Evaluating English to Arabic machine translators , 2013, 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[6]  Marianna Apidianaki,et al.  Data-Driven Semantic Analysis for Multilingual WSD and Lexical Selection in Translation , 2009, EACL.

[7]  Gerhard Weikum,et al.  Towards a universal wordnet by learning from combined evidence , 2009, CIKM.

[8]  Andrew Krizhanovsky,et al.  Multilingual Ontology Matching based on Wiktionary Data Accessible via SPARQL Endpoint , 2011, RCDL.

[9]  D. McDermott LANGUAGE OF THOUGHT , 2012 .

[10]  Véronique Hoste,et al.  SemEval-2013 Task 10: Cross-lingual Word Sense Disambiguation , 2013, *SEMEVAL.

[11]  Mustafa Jarrar,et al.  Position paper: towards the notion of gloss, and the adoption of linguistic resources in formal ontology engineering , 2006, WWW '06.

[12]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[13]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[14]  Chu-Ren Huang,et al.  Kyoto: a wiki for establishing semantic interoperability for knowledge sharing across languages and cultures , 2011 .

[15]  Declan O'Sullivan,et al.  Cross-Lingual Ontology Mapping - An Investigation of the Impact of Machine Translation , 2009, ASWC.

[16]  Ondrej Sváb-Zamazal,et al.  State-of-the-Art in Multilingual and Cross-Lingual Ontology Matching , 2014, Towards the Multilingual Semantic Web.

[17]  Darja Fiser,et al.  Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet , 2009, LTC.

[18]  Matteo Palmonari,et al.  Cross-lingual lexical matching with word translation and local similarity optimization , 2015, SEMANTiCS.

[19]  C. Allen,et al.  Stanford Encyclopedia of Philosophy , 2011 .

[20]  Pedro F. Miret,et al.  Wikipedia , 2008, Monatsschrift für Deutsches Recht.

[21]  Graeme Hirst,et al.  Ontology and the Lexicon , 2004, Handbook on Ontologies.

[22]  Antoni Oliver,et al.  Parallel Corpora for WordNet Construction: Machine Translation vs. Automatic Sense Tagging , 2012, CICLing.

[23]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[24]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[25]  Bernardo Cuenca Grau,et al.  LogMap: Logic-Based and Scalable Ontology Matching , 2011, SEMWEB.

[26]  Hai Zhuge,et al.  Resource space model, OWL and database: Mapping and integration , 2008, TOIT.

[27]  D. Tufis,et al.  BalkaNet : Aims , Methods , Results and Perspectives . A General Overview , 2004 .

[28]  Susan Goldin-Meadow,et al.  What makes us smart? Core knowledge and natural language , 2003 .

[29]  Heiner Stuckenschmidt,et al.  MultiFarm: A benchmark for multilingual ontology matching , 2012, J. Web Semant..

[30]  Mustafa Jarrar The Arabic ontology , 2013 .

[31]  Benoît Sagot,et al.  Building a free French wordnet from multilingual resources , 2008 .

[32]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[33]  Christiane Fellbaum,et al.  Towards Building Lexical Ontology via Cross-Language Matching , 2014, GWC.

[34]  Simone Paolo Ponzetto,et al.  Collaboratively built semi-structured content and Artificial Intelligence: The story so far , 2013, Artif. Intell..

[35]  Izzat Alsmadi,et al.  Evaluating English to Arabic Machine Translation Using BLEU , 2013 .

[36]  Heiko Paulheim,et al.  WikiMatch - using Wikipedia for ontology matching , 2012, OM.

[37]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[38]  Philipp Cimiano,et al.  A Machine Learning Approach to Multilingual and Cross-Lingual Ontology Matching , 2011, SEMWEB.

[39]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[40]  J. Scott McCarley Should we Translate the Documents or the Queries in Cross-language Information Retrieval? , 1999, ACL.

[41]  Nancy Ide,et al.  Sense Discrimination with Parallel Corpora , 2002, SENSEVAL.

[42]  Jeff Z. Pan,et al.  Resource Description Framework , 2020, Definitions.

[43]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[44]  Roberto Navigli,et al.  The English lexical substitution task , 2009, Lang. Resour. Evaluation.

[45]  Giovanni Semeraro,et al.  Cross-language Semantic Matching for Discovering Links to e-gov Services in the LOD Cloud , 2013, KNOW@LOD.

[46]  Piek Vossen,et al.  EUROWORDNET: A MULTILINGUAL DATABASE OF AUTONOMOUS AND LANGUAGE-SPECIFIC WORDNETS CONNECTED VIA AN INTER-LINGUALINDEX , 2004, International Journal of Lexicography.

[47]  Asunción Gómez-Pérez,et al.  Challenges for the multilingual Web of Data , 2012, J. Web Semant..

[48]  Serena Sorrentino,et al.  Automatic generation of probabilistic relationships for improving schema matching , 2011, Inf. Syst..

[49]  Colin J. Ihrig JavaScript Object Notation , 2013 .

[50]  Elena Paslaru Bontas Simperl,et al.  CrowdMap: Crowdsourcing Ontology Alignment with Microtasks , 2012, SEMWEB.

[51]  Timothy W. Finin,et al.  Semantic Message Passing for Generating Linked Data from Tables , 1999, SEMWEB.

[52]  Kenneth Ward Church,et al.  Using bilingual materials to develop word sense disambiguation methods , 1992, TMI.

[53]  Ian Horrocks,et al.  Ontologies and the semantic web , 2008, CACM.

[54]  Francis Bond,et al.  Linking and Extending an Open Multilingual Wordnet , 2013, ACL.

[55]  Lorena Otero-Cerdeira,et al.  Ontology matching: A literature review , 2015, Expert Syst. Appl..

[56]  Emanuele Pianta,et al.  Revising the Wordnet Domains Hierarchy: semantics, coverage and balancing , 2004 .

[57]  P. Carruthers The cognitive functions of language , 2002, Behavioral and Brain Sciences.

[58]  Gregory Johnson Google Translate http://translate.google.com/ , 2012 .

[59]  Michael Strube,et al.  WikiNet: A Very Large Scale Multi-Lingual Concept Network , 2010, LREC.

[60]  Nicholas Gibbins Web Ontology Language , 2009, Encyclopedia of Database Systems.

[61]  Egoitz Laparra,et al.  Multilingual Central Repository version 3.0 , 2012, LREC.

[62]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[63]  Martin Saveski,et al.  Automatic Construction of Wordnets by Using Machine Translation and Language Modeling , 2010 .

[64]  Cosmin Stroe,et al.  Building linked ontologies with high precision using subclass mapping discovery , 2012, Artificial Intelligence Review.

[65]  Ziqi Zhang,et al.  Towards Efficient and Effective Semantic Table Interpretation , 2014, SEMWEB.

[66]  Tomaz Erjavec,et al.  Building Slovene WordNet , 2006, LREC.

[67]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[68]  Mamoun Abu Helou Towards Constructing Linguistic Ontologies: Mapping Framework and Preliminary Experimental Analysis , 2014, DWAI@AI*IA.

[69]  Gerhard Weikum,et al.  Constructing and utilizing wordnets using statistical methods , 2012, Lang. Resour. Evaluation.

[70]  Steffen Staab,et al.  RDF Schema , 2020 .

[71]  Gosse Bouma,et al.  Cross-lingual Ontology Alignment using EuroWordNet and Wikipedia , 2010, LREC.

[72]  Kurt Sandkuhl,et al.  Context-based Ontology Matching: Concept and Application Cases , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[73]  Aynaz Taheri,et al.  Pay-As-You-Go Multi-user Feedback Model for Ontology Matching , 2014, EKAW.

[74]  Sadok Ben Yahia,et al.  XMap results for OAEI 2017 , 2014, OM@ISWC.

[75]  David Yarowsky,et al.  Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation , 1999, Natural Language Engineering.

[76]  Frank van Harmelen,et al.  Web Ontology Language , 2004 .

[77]  S. Pinker The Language Instinct , 1994 .

[78]  Marta R. Costa-jussà,et al.  Study and Comparison of Rule-Based and Statistical Catalan-Spanish Machine Translation Systems , 2012, Comput. Informatics.

[79]  Margherita Sini,et al.  Mapping AGROVOC and the Chinese Agricultural Thesaurus: Definitions, tools, procedures , 2006, New Rev. Hypermedia Multim..

[80]  Jordan L. Boyd-Graber,et al.  Adding dense, weighted connections to WordNet , 2005 .

[81]  Christiane Fellbaum,et al.  Arabic WordNet. Current State and Future Extensions , 2008 .

[82]  Markus Freitag,et al.  Linking open government data: what journalists wish they had known , 2010, I-SEMANTICS '10.

[83]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[84]  Véronique Hoste,et al.  SemEval-2010 Task 3: Cross-Lingual Word Sense Disambiguation , 2010, SemEval@ACL.

[85]  Vincent Van Asch,et al.  Macro-and micro-averaged evaluation measures [ [ BASIC DRAFT ] ] , 2013 .

[86]  Sonia Bergamaschi,et al.  Schema label normalization for improving schema matching , 2010, Data Knowl. Eng..