Learning and Inference for Information Extraction

Information extraction is a process that extracts limited semantic concepts from text documents and presents them in an organized way. Unlike several other natural language tasks, information extraction has a direct impact on end-user applications. Despite its importance, information ex traction is still a difficult task due to the inherent complexity and ambiguity of human languages. Moreover, mutual dependencies between local predictions of the target concepts further increase difficulty of the task. In order to enhance information extraction technologies, we develop general approaches for two aspects—relational feature generation and global inference with classifiers. It has been quite convincingly argued that relational learning is suitable in training a complicated natural language system. We propose a relational feature generation approach that facilitates relational learning through propositional learning algorithms. In particular, we develop a relational representation language to produce features in a data driven way. The resulting features capture the relational structures of a given domain, and therefore allow the learning algorithms to effectively learn the relational definitions of target concepts. Although the learned classifier can be used to directly predict the target concepts, conflicts between the labels of different target variables often occur due to imperfect classifiers. We propose an inference framework to correct mistakes of the local predictions by using the predictions and task-dependent constraints to produce the best global assignment. This inference framework can be modeled by a Bayesian network or integer linear programming. The proposed learning and inference frameworks have been applied to a variety of information extraction tasks, including entity extraction, entity/relation recognition, and semantic role labeling.

[1]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[2]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[3]  Saso Dzeroski,et al.  Learning Nonrecursive Definitions of Relations with LINUS , 1991, EWSL.

[4]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[5]  Dan Roth,et al.  A Sequential Model for Multi-Class Classification , 2001, EMNLP.

[6]  M. Cali,et al.  Relational learning techniques for natural language information extraction , 1998 .

[7]  Raymond J. Mooney,et al.  Relational learning techniques for natural language information extraction , 1998 .

[8]  Dan Roth,et al.  Scaling Up Context-Sensitive Text Correction , 2001, IAAI.

[9]  Luc De Raedt,et al.  Feature Construction with Version Spaces for Biochemical Applications , 2001, ICML.

[10]  John Shawe-Taylor,et al.  The Perceptron Algorithm with Uneven Margins , 2002, ICML.

[11]  James Cussens Part-of-Speech Tagging Using Progol , 1997, ILP.

[12]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[13]  Dan Roth,et al.  On Kernel Methods for Relational Learning , 2003, ICML.

[14]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[15]  Stephen Muggleton,et al.  To the international computing community: A new East-West challenge , 1994 .

[16]  Jennifer Neville,et al.  Learning relational probability trees , 2003, KDD '03.

[17]  Dan Roth,et al.  Learning with Feature Description Logics , 2002, ILP.

[18]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[19]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[20]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[21]  Stefan Wrobel,et al.  Transformation-Based Learning Using Multirelational Aggregation , 2001, ILP.

[22]  Hwee Tou Ng,et al.  A maximum entropy approach to information extraction from semi-structured and free text , 2002, AAAI/IAAI.

[23]  Nianwen Xue,et al.  Calibrating Features for Semantic Role Labeling , 2004, EMNLP.

[24]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[25]  Dan Roth,et al.  The Use of Classifiers in Sequential Inference , 2001, NIPS.

[26]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[27]  Christian Prins,et al.  Applications of optimisation with Xpress-MP , 2002 .

[28]  Fritz Wysotzki,et al.  Relational Learning with Decision Trees , 1996, ECAI.

[29]  Peter A. Flach,et al.  Confirmation-Guided Discovery of First-Order Rules with Tertius , 2004, Machine Learning.

[30]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[31]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[32]  Filip Železný RSD - Relational Subgroup Discovery , 2006 .

[33]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[34]  Leslie G. Valiant,et al.  Relational Learning for NLP using Linear Threshold Elements , 1999, IJCAI.

[35]  S. T. Buckland,et al.  Computer-Intensive Methods for Testing Hypotheses. , 1990 .

[36]  Ellen M. Voorhees,et al.  Overview of the TREC-9 Question Answering Track , 2000, TREC.

[37]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[38]  Xavier Carreras,et al.  Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling , 2004, CoNLL.

[39]  Ashwin Srinivasan,et al.  Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes , 1999, Data Mining and Knowledge Discovery.

[40]  Luc De Raedt,et al.  Attribute-Value Learning Versus Inductive Logic Programming: The Missing Links (Extended Abstract) , 1998, ILP.

[41]  Dan Roth,et al.  A Classification Approach to Word Prediction , 2000, ANLP.

[42]  Dan Roth,et al.  Exploring evidence for shallow parsing , 2001, CoNLL.

[43]  Saso Dzeroski,et al.  Inductive logic programming and learnability , 1994, SGAR.

[44]  Narendra Ahuja,et al.  Learning to recognize objects , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[45]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[46]  Dan Roth,et al.  Relational Learning via Propositional Algorithms: An Information Extraction Case Study , 2001, IJCAI.

[47]  Lynette Hirschman,et al.  Deep Read: A Reading Comprehension System , 1999, ACL.

[48]  Providen e RIe Immediate-Head Parsing for Language Models , 2001 .

[49]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[50]  Peter A. Flach,et al.  Propositionalization approaches to relational data mining , 2001 .

[51]  D.J.C. MacKay,et al.  Good error-correcting codes based on very sparse matrices , 1997, Proceedings of IEEE International Symposium on Information Theory.

[52]  Owen Rambow,et al.  Use of Deep Linguistic Features for the Recognition and Labeling of Semantic Arguments , 2003, EMNLP.

[53]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[54]  Stefan Kramer,et al.  Bottom-Up Propositionalization , 2000, ILP Work-in-progress reports.

[55]  Peter A. Flach,et al.  IBC: A First-Order Bayesian Classifier , 1999, ILP.

[56]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[57]  Daniel Jurafsky,et al.  Semantic Role Labeling by Tagging Syntactic Chunks , 2004, CoNLL.

[58]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[59]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[60]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[61]  Raymond J. Mooney,et al.  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction , 2003, J. Mach. Learn. Res..

[62]  Céline Rouveirol,et al.  Lazy Propositionalisation for Relational Learning , 2000, ECAI.

[63]  Raymond J. Mooney,et al.  Inductive Logic Programming for Natural Language Processing , 1996, Inductive Logic Programming Workshop.

[64]  Mark Craven,et al.  Relational Learning with Statistical Predicate Invention: Better Models for Hypertext , 2001, Machine Learning.

[65]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[66]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[67]  Xavier Carreras,et al.  Online Learning via Global Feedback for Phrase Recognition , 2003, NIPS.

[68]  Joseph Naor,et al.  Approximation algorithms for the metric labeling problem via a new linear programming formulation , 2001, SODA '01.

[69]  Ashwin Srinivasan,et al.  Feature Construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity by Structural Attributes , 1996, Inductive Logic Programming Workshop.

[70]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[71]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[72]  Sanda M. Harabagiu,et al.  Using Predicate-Argument Structures for Information Extraction , 2003, ACL.

[73]  William W. Cohen Pac-learning Recursive Logic Programs: Negative Results , 1994, J. Artif. Intell. Res..

[74]  Foster J. Provost,et al.  Aggregation-based feature invention and relational concept classes , 2003, KDD '03.

[75]  Daniel Jurafsky,et al.  Shallow Semantic Parsing using Support Vector Machines , 2004, NAACL.

[76]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[77]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[78]  Daniel Gildea,et al.  The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[79]  G. Nemhauser,et al.  Integer Programming , 2020 .

[80]  Dan Roth,et al.  Learning and Inference over Constrained Output , 2005, IJCAI.

[81]  Dan Roth,et al.  A Learning Approach to Shallow Parsing , 1999, EMNLP.

[82]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[83]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[84]  Dan Roth,et al.  Relational Representations that Facilitate Learning , 1999, KR.

[85]  Jean-Daniel Zucker,et al.  Propositionalization for Clustering Symbolic Relational Descriptions , 2002, ILP.

[86]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[87]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[88]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[89]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[90]  Daniel Gildea,et al.  Identifying Semantic Roles Using Combinatory Categorial Grammar , 2003, EMNLP.

[91]  Raymond J. Mooney,et al.  Learning Relations by Pathfinding , 1992, AAAI.

[92]  Dan Roth,et al.  Constraint Classification: A New Approach to Multiclass Classification , 2002, ALT.

[93]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[94]  Wendy G. Lehnert,et al.  Wrap-Up: a Trainable Discourse Module for Information Extraction , 1994, J. Artif. Intell. Res..

[95]  M. Chein,et al.  Conceptual graphs: fundamental notions , 1992 .

[96]  Peter A. Flach,et al.  Comparative Evaluation of Approaches to Propositionalization , 2003, ILP.