Diversicon: Pluggable Lexical Domain Knowledge

Natural language understanding is a key task in a wide range of applications targeting data interoperability or analytics. For the analysis of domain-specific data, specialised knowledge resources (terminologies, grammars, word vector models, lexical databases) are necessary. The heterogeneity of such resources is, however, a major obstacle to their efficient use, especially in combination. This paper presents the open-source Diversicon Framework that helps application developers in finding, integrating, and accessing lexical domain knowledge, both symbolic and statistical, in a unified manner. The major components of the framework are: (1) an API and domain knowledge model that allow applications to retrieve domain knowledge through a common interface from a diversity of resource types, (2) implementations of the API for some of the most commonly used symbolic and statistical knowledge sources, (3) a domain-aware knowledge base that helps integrate static lexico-semantic resources, and (4) an online catalogue that either hosts or links to the existing resources from multiple domains. Support for Diversicon is already integrated into two of the most popular ontology matcher applications, a fact that we exploit to validate the framework and demonstrate its use on a example study that evaluates the effect of several common-sense and domain knowledge resources on a medical ontology matching task.

[1]  J. Trier Der deutsche Wortschatz im Sinnbezirk des Verstandes : die Geschichte eines Sprachlichen Feldes , 1931 .

[2]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[3]  Sue Ellen Wright,et al.  Handbook of Terminology Management: Volume 2: Application-Oriented Terminology Management , 2001 .

[4]  Carlo Strapparava,et al.  Using Domain Information for Word Sense Disambiguation , 2001, *SEMEVAL.

[5]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[6]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems. OTM 2018 Conferences , 2018, Lecture Notes in Computer Science.

[7]  Claudia Soria,et al.  Lexical Markup Framework (LMF) , 2006, LREC.

[8]  Patrick Lambrix,et al.  SAMBO - A system for aligning and merging biomedical ontologies , 2006, J. Web Semant..

[9]  Kevin Donnelly,et al.  SNOMED-CT: The advanced terminology and coding system for eHealth. , 2006, Studies in health technology and informatics.

[10]  Fausto Giunchiglia,et al.  Semantic Matching: Algorithms and Implementation , 2007, J. Data Semant..

[11]  Fausto Giunchiglia,et al.  Approximate Structure-Preserving Semantic Matching , 2008, OTM Conferences.

[12]  Carlo Strapparava,et al.  Semantic Domains in Computational Linguistics , 2009 .

[13]  Antonio Toral,et al.  Linking a domain thesaurus to WordNet and conversion to WordNet-LMF , 2010, ACL 2010.

[14]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[15]  Bernardo Cuenca Grau,et al.  LogMap: Logic-Based and Scalable Ontology Matching , 2011, SEMWEB.

[16]  Philipp Cimiano,et al.  Linking Lexical Resources and Ontologies on the Semantic Web with Lemon , 2011, ESWC.

[17]  Iryna Gurevych,et al.  UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF , 2012, EACL.

[18]  Stefano Ferilli,et al.  A Domain Based Approach to Information Retrieval in Digital Libraries , 2012, IRCDL.

[19]  Aitor Gonzalez-Agirre,et al.  A Graph-Based Method to Improve WordNet Domains , 2012, CICLing.

[20]  Carlo Strapparava,et al.  Mapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources , 2014, LREC.

[21]  Philipp Cimiano,et al.  Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0 , 2014, LREC.

[22]  Alan Bundy,et al.  Dynamic data sharing for facilitating communication during emergency responses , 2014, ISCRAM.

[23]  Christian Chiarcos,et al.  lemonUby - A large, interlinked, syntactically-rich lexical resource for ontologies , 2015, Semantic Web.

[24]  Naren Ramakrishnan,et al.  Designing Domain Specific Word Embeddings: Applications to Disease Surveillance , 2016, ArXiv.

[25]  Nigel Collier,et al.  Improved Semantic Representation for Domain-Specific Entities , 2016, ACL 2016.

[26]  Michael Granitzer,et al.  Robust and Collective Entity Disambiguation through Semantic Embeddings , 2016, SIGIR.

[27]  Fausto Giunchiglia,et al.  Domain-Based Sense Disambiguation in Multilingual Structured Data , 2016 .

[28]  Fausto Giunchiglia,et al.  Language and domain aware lightweight ontology matching , 2017, J. Web Semant..

[29]  Farhad Nooralahzadeh,et al.  Evaluation of Domain-specific Word Embeddings using Knowledge Resources , 2018, LREC.

[30]  Isabel F. Cruz,et al.  Tackling the challenges of matching biomedical ontologies , 2018, J. Biomed. Semant..

[31]  Goran Glavas,et al.  Multilingual and Cross-Lingual Graded Lexical Entailment , 2019, ACL.

[32]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[33]  Fausto Giunchiglia,et al.  Towards Understanding Classification and Identification , 2019, PRICAI.