Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction

Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. We present an algorithm, RAPIER, that uses pairs of sample documents and filled templates to induce pattern-match rules that directly extract fillers for the slots in the template. RAPIER is a bottom-up learning algorithm that incorporates techniques from several inductive logic programming systems. We have implemented the algorithm in a system that allows patterns to have constraints on the words, part-of-speech tags, and semantic classes present in the filler and the surrounding text. We present encouraging experimental results on two domains.

[1]  R. F. Brown,et al.  PERFORMANCE EVALUATION , 2019, ISO 22301:2019 and business continuity management – Understand how to plan, implement and enhance a business continuity management system (BCMS).

[2]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[3]  Stephen Muggleton,et al.  Duce, An Oracle-based Approach to Constructive Induction , 1987, IJCAI.

[4]  Stephen Muggleton,et al.  Machine Invention of First Order Predicates by Inverting Resolution , 1988, ML.

[5]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[6]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[7]  Beth Sundheim,et al.  A Performance Evaluation of Text-Analysis Technologies , 1991, AI Mag..

[8]  Michael J. Pazzani,et al.  An information-based approach to integrating empirical and explanation-based learning , 1991 .

[9]  S. Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning. , 1992, Protein engineering.

[10]  Claire Cardie,et al.  A Case-Based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis , 1993, AAAI.

[11]  Luc De Raedt,et al.  A Theory of Clausal Discovery , 1993, IJCAI.

[12]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[13]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[14]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[15]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[16]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[17]  Raymond J. Mooney,et al.  Combining Top-down and Bottom-up Techniques in Inductive Logic Programming , 1994, ICML.

[18]  Dan I. Moldovan,et al.  Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction , 1995, IEEE Trans. Knowl. Data Eng..

[19]  Ellen Riloff,et al.  Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing , 1996, Lecture Notes in Computer Science.

[20]  M. Cali,et al.  Inducing logic programs without explicit negative examples , 1995 .

[21]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[22]  Scott Bennett,et al.  Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies , 1995, ACL.

[23]  David Fisher,et al.  CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[24]  Scott B. Huffman,et al.  Learning information extraction patterns from examples , 1995, Learning for Natural Language Processing.

[25]  William W. Cohen Text Categorization and Relational Learning , 1995, ICML.

[26]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[27]  Raymond J. Mooney,et al.  Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs , 1995, J. Artif. Intell. Res..

[28]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[29]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[30]  Richard M. Schwartz,et al.  A Fully Statistical Approach to Natural Language Interfaces , 1996, ACL.

[31]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[32]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[33]  Scott W. Bennett,et al.  Learning to Tag Multilingual Texts Through Observation , 1997, EMNLP.

[34]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[35]  Craig A. Knoblock,et al.  Wrapper Induction for Semistructured, Web-based Information Sources , 1998 .

[36]  Dayne Freitag,et al.  Multistrategy Learning for Information Extraction , 1998, ICML.

[37]  Raymond J. Mooney,et al.  Relational learning techniques for natural language information extraction , 1998 .

[38]  M. Cali,et al.  Relational learning techniques for natural language information extraction , 1998 .

[39]  Dayne Freitag,et al.  Information Extraction from HTML: Application of a General Machine Learning Approach , 1998, AAAI/IAAI.

[40]  Markus Junker,et al.  Learning for Text Categorization and Information Extraction with ILP , 1999, Learning Language in Logic.

[41]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[42]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[43]  Sanda M. Harabagiu,et al.  Acquisition of Linguistic Patterns for Knowledge-based Information Extraction , 2000, LREC.

[44]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[45]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[46]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[47]  Mark Craven,et al.  Representing Sentence Structure in Hidden Markov Models for Information Extraction , 2001, IJCAI.

[48]  Dan Roth,et al.  Relational Learning via Propositional Algorithms: An Information Extraction Case Study , 2001, IJCAI.

[49]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[50]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[51]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[52]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[53]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[54]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.