Using ILP to Construct Features for Information Extraction from Semi-structured Text

Machine-generated documents containing semistructured text are rapidly forming the bulk of data being stored in an organisation. Given a feature-based representation of such data, methods like SVMs are able to construct good models for information extraction (IE). But how are the feature-definitions to be obtained in the first place? (We are referring here to the representation problem: selecting good features from the ones defined comes later.) So far, features have been defined manually or by using special-purpose programs: neither approach scaling well to handle the heterogeneity of the data or new domain-specific information. We suggest that Inductive Logic Programming (ILP) could assist in this. Specifically, we demonstrate the use of ILP to define features for seven IE tasks using two disparate sources of information. Our findings are as follows: (1) the ILP system is able to identify efficiently large numbers of good features. Typically, the time taken to identify the features is comparable to the time taken to construct the predictive model; and (2) SVM models constructed with these ILP-features are better than the best reported to date that rely heavily on hand-crafted features. For the ILP practioneer, we also present evidence supporting the claim that, for IE tasks, using an ILP system to assist in constructing an extensional representation of text data (in the form of features and their values) is better than using it to construct intensional models for the tasks (in the form of rules for information extraction).

[1]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[2]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[3]  Ashwin Srinivasan,et al.  Warmr: a data mining tool for chemical data , 2001, J. Comput. Aided Mol. Des..

[4]  J. W. Lloyd,et al.  Logic for Learning , 2003, Cognitive Technologies.

[5]  Razvan C. Bunescu,et al.  Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[6]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[7]  Aidan Finn,et al.  Multi-level Boundary Classification for Information Extraction , 2004, ECML.

[8]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[9]  Stephen Muggleton,et al.  Support Vector Inductive Logic Programming , 2005, Discovery Science.

[10]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[11]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[12]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Dan Roth,et al.  Relational Learning via Propositional Algorithms: An Information Extraction Case Study , 2001, IJCAI.

[15]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[16]  Shan-Hwei Nienhuys-Cheng,et al.  Foundations of Inductive Logic Programming , 1997, Lecture Notes in Computer Science.

[17]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[18]  Walter Daelemans,et al.  Applying System Combination to Base Noun Phrase Identification , 2000, COLING.

[19]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[20]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[21]  James S. Aitken Learning Information Extraction Rules: An Inductive Logic Programming approach , 2002, ECAI.

[22]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[23]  Peter A. Flach,et al.  Propositionalization approaches to relational data mining , 2001 .

[24]  Dayne Freitag,et al.  Toward General-Purpose Learning for Information Extraction , 1998, ACL.

[25]  Razvan Bunescu and Raymond J. Mooney Relational Markov Networks for Collective Information Extraction , 2004 .

[26]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[27]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[28]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[29]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[30]  Ashwin Srinivasan,et al.  Word Sense Disambiguation Using Inductive Logic Programming , 2007, ILP.

[31]  Alex M. Andrew,et al.  Logic for Learning: Learning Comprehensible Theories from Structured Data , 2004 .