Language Agnostic Dictionary Extraction

Ontologies are dynamic artifacts that evolve both in structure and content. Keeping them up-to-date is a very expensive and critical operation for any application relying on semantic Web technologies. In this paper we focus on evolving the content of an ontology by extracting relevant instances of ontological concepts from text. The novelty of this work is that we propose a technique which is (i) completely language independent, (ii) combines statistical methods with human-in-the-loop and (iii) exploits Linked Data as bootstrapping source. Experiments on a publicly available parallel medical corpus show comparable performances regardless of the chosen language.

[1]  Stefan Feuerriegel,et al.  Generating Domain-Specific Dictionaries using Bayesian Learning , 2015, ECIS.

[2]  Seong-Bae Park,et al.  An automatic ontology population with a machine learning technique from semi-structured documents , 2009, 2009 International Conference on Information and Automation.

[3]  Magnus Sahlgren,et al.  Automatic bilingual lexicon acquisition using random indexing of parallel corpora , 2005, Nat. Lang. Eng..

[4]  Roi Blanco,et al.  Lightweight Multilingual Entity Extraction and Linking , 2017, WSDM.

[5]  Isabelle Augenstein,et al.  Unsupervised wrapper induction using linked data , 2013, K-CAP.

[6]  Oladimeji Farri,et al.  Adverse Drug Event Detection in Tweets with Semi-Supervised Convolutional Neural Networks , 2017, WWW.

[7]  Neal Lewis,et al.  SPOT the Drug! An Unsupervised Pattern Matching Method to Extract Drug Names from Very Large Clinical Corpora , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[8]  Xiaolong Wang,et al.  Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries , 2015, Inf..

[9]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[10]  Thanos G. Stavropoulos,et al.  User-Driven Ontology Population from Linked Data Sources , 2016, KESW.

[11]  Diego Reforgiato Recupero,et al.  Semantic Web Machine Reading with FRED , 2017, Semantic Web.

[12]  Pierre Zweigenbaum,et al.  Text mining for pharmacovigilance: Using machine learning for drug name recognition and drug-drug interaction extraction and classification , 2015, J. Biomed. Informatics.

[13]  Edith Schonberg,et al.  Extracting Enterprise Vocabularies Using Linked Open Data , 2009, International Semantic Web Conference.