Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves

Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. The goal of our research is to find new approaches within ILP particularly suited for large, highly-skewed domains. We propose Gleaner, a randomized search method that collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an “at least L of these K clauses” thresholding method to combine sets of selected clauses. Our research focuses on Multi-Slot Information Extraction (IE), a task that typically involves many more negative examples than positive examples. We formulate this problem into a relational domain, using two large testbeds involving the extraction of important relations from the abstracts of biomedical journal articles. We compare Gleaner to ensembles of standard theories learned by Aleph, finding that Gleaner produces comparable testset results in a fraction of the training time.

[1]  Jude W. Shavlik,et al.  Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction , 2004, ILP.

[2]  Luc De Raedt,et al.  Phase Transitions and Stochastic Local Search in k-Term DNF Learning , 2002, ECML.

[3]  Stephen Muggleton,et al.  Learning Stochastic Logic Programs , 2000, Electron. Trans. Artif. Intell..

[4]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[5]  David W. Opitz,et al.  Actively Searching for an E(cid:11)ective Neural-Network Ensemble , 1996 .

[6]  R. Bowden Learning Statistical Models of Human Motion , 2000 .

[7]  Tina Eliassi-Rad,et al.  A Theory-Refinement Approach to Information Extraction , 2001, ICML.

[8]  Susanne Hoche,et al.  Relational Learning Using Constrained Confidence-Rated Boosting , 2001, ILP.

[9]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[10]  Avi Pfeffer,et al.  Learning Probabilities for Noisy First-Order Rules , 1997, IJCAI.

[11]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[12]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  P. Abbeel,et al.  Label and Link Prediction in Relational Data , 2003 .

[15]  S. Muggleton Stochastic Logic Programs , 1996 .

[16]  Stephen Muggleton Inductive Logic Programming: 6th International Workshop, ILP-96, Stockholm, Sweden, August 26-28, 1996, Selected Papers , 1997 .

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  Thomas Stützle,et al.  Stochastic Local Search: Foundations & Applications , 2004 .

[20]  Hendrik Blockeel,et al.  Cumulativity as inductive bias , 2000 .

[21]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[22]  Ashwin Srinivasan,et al.  A Monte Carlo Study of Randomised Restarted Search in ILP , 2004, ILP.

[23]  Igor Kononenko,et al.  Naive Bayesian classifier within ILP-R , 1995 .

[24]  Bart Selman,et al.  Local search strategies for satisfiability testing , 1993, Cliques, Coloring, and Satisfiability.

[25]  Jan Maluszynski,et al.  Logic, Programming and Prolog (2ed) , 1995 .

[26]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[27]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[28]  Ryszard S. Michalski,et al.  Inductive inference of VL decision rules , 1977, SGAR.

[29]  Ashwin Srinivasan,et al.  Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes , 1999, Data Mining and Knowledge Discovery.

[30]  Jeff Hardin,et al.  The World of the Cell , 1986 .

[31]  Mark Craven,et al.  Representing Sentence Structure in Hidden Markov Models for Information Extraction , 2001, IJCAI.

[32]  Jesse Davis,et al.  An Integrated Approach to Learning Bayesian Networks of Rules , 2005, ECML.

[33]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[34]  Ashwin Srinivasan,et al.  Lattice-Search Runtime Distributions May Be Heavy-Tailed , 2002, ILP.

[35]  David D. Lewis,et al.  Evaluating Text Categorization I , 1991, HLT.

[36]  Lappoon R. Tang and Raymond J. Mooney and Prem Melville Scaling Up ILP to Large Examples: Results on Link Discovery for Counter-Terrorism , 2003 .

[37]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[38]  James S. Aitken Learning Information Extraction Rules: An Inductive Logic Programming approach , 2002, ECAI.

[39]  Jesse Davis,et al.  Establishing Identity Equivalence in Multi-Relational Domains , 2005 .

[40]  Luc De Raedt,et al.  Bayesian Logic Programs , 2001, ILP Work-in-progress reports.

[41]  Frank van Harmelen,et al.  Proceedings of the 15th European Conference on Artificial Intelligence , 2002 .

[42]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[43]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[44]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[45]  Mark Craven,et al.  Relational Learning with Statistical Predicate Invention: Better Models for Hypertext , 2001, Machine Learning.

[46]  James I. Garrels,et al.  The Yeast Protein Database (YPD): a curated proteome database for Saccharomyces cerevisiae , 1998, Nucleic Acids Res..

[47]  Hagit Shatkay,et al.  Mining the Biomedical Literature in the Genomic Era: An Overview , 2003, J. Comput. Biol..

[48]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[49]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[50]  David M. Pennock,et al.  Statistical relational learning for document mining , 2003, Third IEEE International Conference on Data Mining.

[51]  David Page,et al.  An Empirical Evaluation of Bagging in Inductive Logic Programming , 2002, ILP.

[52]  Luc De Raedt,et al.  nFOIL: Integrating Naïve Bayes and FOIL , 2005, AAAI.

[53]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[54]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[55]  Tom Fawcett,et al.  Using rule sets to maximize ROC performance , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[56]  David Kauchak,et al.  Sources of Success for Boosted Wrapper Induction , 2004, J. Mach. Learn. Res..

[57]  Ashwin Srinivasan,et al.  Feature Construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity by Structural Attributes , 1996, Inductive Logic Programming Workshop.

[58]  Nada Lavrač,et al.  An Introduction to Inductive Logic Programming , 2001 .

[59]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[60]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[61]  Stefan Kramer,et al.  Towards tight bounds for rule learning , 2004, ICML.

[62]  Jude Shavlik,et al.  Learning to Extract Genic Interactions Using Gleaner , 2005 .

[63]  R. Quinlan Relational learning and boosting , 2001 .

[64]  Alfonso Valencia,et al.  Information extraction in molecular biology , 2002, Briefings Bioinform..

[65]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[66]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.