Learning Recursive Patterns for Biomedical Information Extraction

Information in text form remains a greatly unexploited source of biological information. Information Extraction (IE) techniques are necessary to map this information into structured representations that allow facts relating domain-relevant entities to be automatically recognized. In biomedical IE tasks, extracting patterns that model implicit relations among entities is particularly important since biological systems intrinsically involve interactions among several entities. In this paper, we resort to an Inductive Logic Programming (ILP) approach for the discovery of mutual recursive patterns from text. Mutual recursion allows dependencies among entities to be explored in data and extraction models to be applied in a context-sensitive mode. In particular, IE models are discovered in form of classification rules encoding the conditions to fill a pre-defined information template. An application to a real-world dataset composed by publications selected to support biologists in the task of automatic annotation of a genomic database is reported.

[1]  J. Lloyd Foundations of Logic Programming , 1984, Symbolic Computation.

[2]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[3]  Michelangelo Ceci,et al.  A Hybrid Strategy for Knowledge Extraction from Biomedical Documents , .

[4]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[5]  James S. Aitken Learning Information Extraction Rules: An Inductive Logic Programming approach , 2002, ECAI.

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[8]  S. Džeroski,et al.  Relational Data Mining , 2001, Springer Berlin Heidelberg.

[9]  Markus Junker,et al.  Learning for Text Categorization and Information Extraction with ILP , 1999, Learning Language in Logic.

[10]  Dayne Freitag,et al.  Toward General-Purpose Learning for Information Extraction , 1998, ACL.

[11]  Claire Nédellec,et al.  Machine Learning for Information Extraction in Genomics — State of the Art and Perspectives , 2004 .

[12]  Floriana Esposito AI*IA 2001: Advances in Artificial Intelligence , 2001, Lecture Notes in Computer Science.

[13]  Marcella Attimonelli,et al.  HmtDB, a Human Mitochondrial Genomic Resource Based on Variability Studies Supporting Population Genetics and Biomedical Research , 2005, BMC Bioinformatics.

[14]  James Cussens,et al.  Learning Language in Logic , 2001, Lecture Notes in Computer Science.

[15]  Nicola Fanizzi,et al.  Learning Logic Models for Automated Text Categorization , 2001, AI*IA.

[16]  Hagit Shatkay,et al.  Mining the Biomedical Literature in the Genomic Era: An Overview , 2003, J. Comput. Biol..

[17]  J. W. Lloyd,et al.  Foundations of logic programming; (2nd extended ed.) , 1987 .

[18]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[19]  John Wylie Lloyd,et al.  Foundations of Logic Programming , 1987, Symbolic Computation.

[20]  Jude W. Shavlik,et al.  Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction , 2004, ILP.

[21]  Donato Malerba,et al.  Learning Recursive Theories in the Normal ILP Setting , 2003, Fundam. Informaticae.

[22]  Raymond J. Mooney,et al.  Learning for Semantic Interpretation: Scaling Up without Dumbing Down , 2001, Learning Language in Logic.

[23]  Saso Dzeroski,et al.  An Introduction to Inductive Logic Programming and Learning Language in Logic , 2001, Learning Language in Logic.

[24]  Pierre Flener,et al.  Inductive Synthesis of Recursive Logic Programs: Achievements and Prospects , 1999, J. Log. Program..

[25]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.