Relational learning techniques for natural language information extraction

The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves speci c types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain speci c information, making them di cult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This paper presents a novel rule representation speci c to natural language and a learning system, Rapier, which learns information extraction rules. Rapier takes pairs of documents and lled templates indicating the information to be extracted and learns patterns to extract llers for the slots in the template. This proposal presents initial results on a small corpus of computer-related job postings with a preliminary version of Rapier. Future research will involve several enhancements to Rapier as well as more thorough testing on several domains and extension to additional natural language processing tasks. We intend to extend the rule representation and algorithm to allow for more types of constraints than are currently supported. We also plan to incorporate active learning, or sample selection, methods, speci cally query by committee, into Rapier. These methods have the potential to substantially reduce the amount of annotation required. We will explore the issue of distinguishing relevant and irrelevant messages, since currently Rapier only extracts from the any messages given to it, assuming that all are relevant. We also intend to run much larger tests with Rapier on multiple domains including the terrorism domain from the third and fourth Message Uncderstanding Conferences, which will allow comparison against other systems. Finally, we plan to demonstrate the generality of Rapier`s representation and algorithm by applying it to other natural language processing tasks such as word sense disambiguation.

[1]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[2]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[3]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[4]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[5]  Stephen Muggleton,et al.  Duce, An Oracle-based Approach to Constructive Induction , 1987, IJCAI.

[6]  Stephen Muggleton,et al.  Machine Invention of First Order Predicates by Inverting Resolution , 1988, ML.

[7]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[8]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[9]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[10]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[11]  Beth Sundheim,et al.  A Performance Evaluation of Text-Analysis Technologies , 1991, AI Mag..

[12]  Michael J. Pazzani,et al.  An information-based approach to integrating empirical and explanation-based learning , 1991 .

[13]  Masayuki Numao,et al.  Discrimination-Based Constructive Induction of Logic Programs , 1992, AAAI.

[14]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[15]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[18]  S. Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning. , 1992, Protein engineering.

[19]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[20]  Luc De Raedt,et al.  A Theory of Clausal Discovery , 1993, IJCAI.

[21]  Claire Cardie,et al.  UMass/Hughes: Description of the CIRCUS System Used for MUC-51 , 1993, MUC.

[22]  Kenneth Ward Church,et al.  Introduction to the Special Issue on Computational Linguistics Using Large Corpora , 1993, Comput. Linguistics.

[23]  Eric Brill,et al.  Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach , 1993, ACL.

[24]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[25]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[26]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[27]  Risto Miikkulainen,et al.  Subsymbolic natural language processing - an integrated model of scripts, lexicon, and memory , 1993, Neural network modeling and connectionism.

[28]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[29]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[30]  J. Ross Quinlan,et al.  The Minimum Description Length Principle and Categorical Theories , 1994, ICML.

[31]  Wendy G. Lehnert,et al.  Using Decision Trees for Coreference Resolution , 1995, IJCAI.

[32]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[33]  Ellen Riloff,et al.  Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing , 1996, Lecture Notes in Computer Science.

[34]  M. Cali,et al.  Inducing logic programs without explicit negative examples , 1995 .

[35]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[36]  Scott Bennett,et al.  Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies , 1995, ACL.

[37]  David Fisher,et al.  CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[38]  William W. Cohen Fast Eeective Rule Induction , 1995 .

[39]  Scott B. Huffman,et al.  Learning information extraction patterns from examples , 1995, Learning for Natural Language Processing.

[40]  William W. Cohen Text Categorization and Relational Learning , 1995, ICML.

[41]  Issues in inductive learning of domain-specific text extraction rules , 1995, Learning for Natural Language Processing.

[42]  BrillEric,et al.  Transformation-based error-driven learning and natural language processing , 1995 .

[43]  J. R. Quinlan,et al.  MDL and Categorical Theories (Continued) , 1995, ICML.

[44]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[45]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[46]  Cynthia A. Thompson Acquisition of a Lexicon from Semantic Representations of Sentences , 1995, ACL.

[47]  Raymond J. Mooney,et al.  Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs , 1995, J. Artif. Intell. Res..

[48]  David Fisher,et al.  Description of the UMass system as used for MUC-6 , 1995, MUC.

[49]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[50]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[51]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[52]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[53]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[54]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[55]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[56]  Shlomo Argamon,et al.  Minimizing Manual Annotation Cost in Supervised Training from Corpora , 1996, ACL.

[57]  Richard M. Schwartz,et al.  A Fully Statistical Approach to Natural Language Interfaces , 1996, ACL.

[58]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[59]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[60]  John D. Lafferty,et al.  Text Segmentation Using Exponential Models , 1997, EMNLP.

[61]  Andrew Smith,et al.  Detecting Subject Boundaries Within Text: A Language Independent Statistical Approach , 1997, EMNLP.

[62]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[63]  Claire Cardie,et al.  Empirical Methods in Information Extraction , 1997, AI Mag..

[64]  Scott W. Bennett,et al.  Learning to Tag Multilingual Texts Through Observation , 1997, EMNLP.

[65]  Prasad Tadepalli,et al.  Active Learning with Committees for Text Categorization , 1997, AAAI/IAAI.

[66]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[67]  Dayne Freitag,et al.  Using grammatical inference to improve precision in information extraction , 1997, ICML 1997.

[68]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[69]  Craig A. Knoblock,et al.  Wrapper Induction for Semistructured, Web-based Information Sources , 1998 .

[70]  Dayne Freitag,et al.  Multistrategy Learning for Information Extraction , 1998, ICML.

[71]  Raymond J. Mooney,et al.  Semantic Lexicon Acquisition for Learning Natural Language Interfaces , 1998, VLC@COLING/ACL.

[72]  Dayne Freitag,et al.  Toward General-Purpose Learning for Information Extraction , 1998, ACL.

[73]  Sanda M. Harabagiu,et al.  Acquisition of Linguistic Patterns for Knowledge-based Information Extraction , 2000, LREC.

[74]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[75]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.