A logic-based relational learning approach to relation extraction: The OntoILPER system

Abstract Relation Extraction (RE), the task of detecting and characterizing semantic relations between entities in text, has gained much importance in the last two decades, mainly in the biomedical domain. Many papers have been published on Relation Extraction using supervised machine learning techniques. Most of these techniques rely on statistical methods, such as feature-based and tree-kernels-based methods. Such statistical learning techniques are usually based on a propositional hypothesis space for representing examples, i.e., they employ an attribute–value representation of features. This kind of representation has some drawbacks, particularly in the extraction of complex relations which demand more contextual information about the involving instances, i.e., it is not able to effectively capture structural information from parse trees without loss of information. In this work, we present OntoILPER, a logic-based relational learning approach to Relation Extraction that uses Inductive Logic Programming for generating extraction models in the form of symbolic extraction rules. OntoILPER takes profit of a rich relational representation of examples, which can alleviate the aforementioned drawbacks. The proposed relational approach seems to be more suitable for Relation Extraction than statistical ones for several reasons that we argue. Moreover, OntoILPER uses a domain ontology that guides the background knowledge generation process and is used for storing the extracted relation instances. The induced extraction rules were evaluated on three protein–protein interaction datasets from the biomedical domain. The performance of OntoILPER extraction models was compared with other state-of-the-art RE systems. The encouraging results seem to demonstrate the effectiveness of the proposed solution.

[1]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[2]  Luc De Raedt,et al.  Spatial relation extraction using relational learning , 2011, ILP 2011.

[3]  Heng Ji,et al.  Joint Event Extraction via Structured Prediction with Global Features , 2013, ACL.

[4]  Sa-Kwang Song,et al.  An intensive case study on kernel-based relation extraction , 2013, Multimedia Tools and Applications.

[5]  Nuno A. Fonseca,et al.  AND Parallelism for ILP: The APIS System , 2013, ILP.

[6]  D. N. Ranasinghe,et al.  Inductive Logic Programming in an Agent System forOntological Relation Extraction , 2011 .

[7]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[8]  Yasemin Altun,et al.  Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger , 2006, EMNLP.

[9]  Siddhartha Jonnalagadda,et al.  Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text , 2009, HLT-NAACL.

[10]  Gordon Plotkin,et al.  A Note on Inductive Generalization , 2008 .

[11]  Jose Santos,et al.  Efficient learning and evaluation of complex concepts in inductive logic programming , 2010 .

[12]  Usman Qamar,et al.  A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set , 2015, Comput. Math. Methods Medicine.

[13]  Jorge Nocedal,et al.  Sample size selection in optimization methods for machine learning , 2012, Math. Program..

[14]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[15]  Guodong Zhou,et al.  Dependency-directed Tree Kernel-based Protein-Protein Interaction Extraction from Biomedical Literature , 2011, IJCNLP.

[16]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[17]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[18]  Bernard Espinasse,et al.  Information Extraction from the Web: An Ontology-Based Method Using Inductive Logic Programming , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[19]  ChengXiang Zhai,et al.  A Systematic Exploration of the Feature Space for Relation Extraction , 2007, NAACL.

[20]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[21]  Céline Rouveirol,et al.  Lazy Propositionalisation for Relational Learning , 2000, ECAI.

[22]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[23]  Udo Hahn,et al.  SYNTACTIC SIMPLIFICATION AND SEMANTIC ENRICHMENT—TRIMMING DEPENDENCY GRAPHS FOR EVENT EXTRACTION , 2011, Comput. Intell..

[24]  Richard Tobin,et al.  Datasets for generic relation extraction* , 2011, Natural Language Engineering.

[25]  John Francis Kros,et al.  Data mining and the impact of missing data , 2003, Ind. Manag. Data Syst..

[26]  C. Nédellec,et al.  Semantic Annotation in the Alvis Project , 2006 .

[27]  Adeline Nazarenko,et al.  Ontologies and Information Extraction , 2006, ArXiv.

[28]  Nuno A. Fonseca,et al.  Parallel Algorithms for Multirelational Data Mining: Application to Life Science Problems , 2016, Resource Management for Big Data Platforms.

[29]  Stefan Wrobel,et al.  A Logic-Based Approach to Relation Extraction from Texts , 2009, ILP.

[30]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[31]  Jing Jiang,et al.  Information Extraction from Text , 2012, Mining Text Data.

[32]  Ulf Leser,et al.  A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature , 2010, PLoS Comput. Biol..

[33]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[34]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[35]  Stephen Muggleton,et al.  Inductive Logic Programming , 2011, Lecture Notes in Computer Science.

[36]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[37]  Tomaz Podobnikar,et al.  Evaluation of inductive logic programming for information extraction from natural language texts to support spatial data recommendation services , 2011, Int. J. Geogr. Inf. Sci..

[38]  Jian Su,et al.  A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features , 2006, ACL.

[39]  Jun'ichi Tsujii,et al.  Entity-Focused Sentence Simplification for Relation Extraction , 2010, COLING.

[40]  Rafael Dueire Lins,et al.  Transforming graph-based sentence representations to alleviate overfitting in relation extraction , 2014, DocEng '14.

[41]  Keun Ho Ryu,et al.  A Novel Approach for Protein-Named Entity Recognition and Protein-Protein Interaction Extraction , 2015 .

[42]  Sung-Hyon Myaeng,et al.  Relation Extraction based on Extended Composite Kernel using Flat Lexical Features , 2009 .

[43]  OpitzDavid,et al.  Popular ensemble methods , 1999 .

[44]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[45]  Alicia Ageno,et al.  Adaptive information extraction , 2006, CSUR.

[46]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[47]  Yong Zhang,et al.  Tree Kernel-based Protein-Protein Interaction Extraction Considering both Modal Verb Phrases and Appositive Dependency Features , 2015, WWW.

[48]  Jari Björne,et al.  TEES 2.2: Biomedical Event Extraction for Diverse Corpora , 2015, BMC Bioinformatics.

[49]  Luc De Raedt,et al.  kLog: A Language for Logical and Relational Learning with Kernels (Extended Abstract) , 2012, IJCAI.

[50]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[51]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[52]  Hao Wang,et al.  Semantic data mining: A survey of ontology-based approaches , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[53]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[54]  Ron Kohavi,et al.  Automatic Parameter Selection by Minimizing Estimated Error , 1995, ICML.

[55]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[56]  H. Lan,et al.  SWRL : A semantic Web rule language combining OWL and ruleML , 2004 .

[57]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[58]  Jude W. Shavlik,et al.  Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves , 2006, Machine Learning.

[59]  Vangelis Karkaletsis,et al.  Ontology Based Information Extraction from Text , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[60]  Oren Etzioni,et al.  Semantic Role Labeling for Open Information Extraction , 2010, HLT-NAACL 2010.

[61]  Luc De Raedt,et al.  Relational Learning for Spatial Relation Extraction from Natural Language , 2011, ILP.

[62]  Guodong Zhou,et al.  Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information , 2007, EMNLP.

[63]  Claudio Giuliano,et al.  Relation extraction and the influence of automatic named-entity recognition , 2007, TSLP.

[64]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..

[65]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[66]  Ashwin Srinivasan,et al.  Data and task parallelism in ILP using MapReduce , 2011, Machine Learning.

[67]  Stephen Muggleton,et al.  ProGolem: A System Based on Relative Minimal Generalisation , 2009, ILP.

[68]  Sanda M. Harabagiu,et al.  Shallow Semantics for Relation Extraction , 2005, IJCAI.

[69]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[70]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[71]  Terri K. Attwood,et al.  Learning to extract relations for protein annotation , 2007, ISMB/ECCB.

[72]  Hiroshi Yasuda,et al.  A spectrum tree kernel (論文特集:データマイニングと統計数理) , 2007 .

[73]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.