Numerical Relation Extraction with Minimal Supervision

We study a novel task of numerical relation extraction with the goal of extracting relations where one of the arguments is a number or a quantity (e.g., atomic number(Aluminium, 13), inflation rate(India, 10.9%)). This task presents peculiar challenges not found in standard Information Extraction (IE), such as the difficulty of matching numbers in distant supervision and the importance of units. We design two extraction systems that require minimal human supervision per relation: (1) NumberRule, a rule based extractor, and (2) NumberTron, a probabilistic graphical model. We find that both systems dramatically outperform MultiR, a state-of-the-art non-numerical IE model, obtaining up to 25 points F-score improvement.

[1]  Christopher D. Manning,et al.  Modeling Semantic Containment and Exclusion in Natural Language Inference , 2008, COLING.

[2]  Eneko Agirre,et al.  Diamonds in the Rough: Event Extraction from Imperfect Microblog Data , 2015, NAACL.

[3]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[4]  James R. Hurford,et al.  The linguistic theory of numerals , 1975 .

[5]  Ari Rappoport,et al.  Extraction and Approximation of Numerical Attributes from the Web , 2010, ACL.

[6]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[7]  Somnath Banerjee,et al.  Learning to rank for quantity consensus queries , 2009, SIGIR.

[8]  Sunita Sarawagi,et al.  Open-domain quantity queries on web tables: annotation, response, and consensus models , 2014, KDD.

[9]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[10]  Alessandro Moschitti,et al.  End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories , 2011, ACL.

[11]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[12]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[13]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[14]  Dan Roth,et al.  Reasoning about Quantities in Natural Language , 2015, TACL.

[15]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[16]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[17]  Dayne Freitag,et al.  Toward General-Purpose Learning for Information Extraction , 1998, ACL.

[18]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[19]  Daniel S. Weld,et al.  Learning 5000 Relational Extractors , 2010, ACL.

[20]  Dan Roth,et al.  Joint Inference for Event Timeline Construction , 2012, EMNLP.

[21]  Luke S. Zettlemoyer,et al.  Learning to Automatically Solve Algebra Word Problems , 2014, ACL.

[22]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[23]  Richard Montague,et al.  The Proper Treatment of Quantification in Ordinary English , 1973 .

[24]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[25]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[26]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[27]  Oren Etzioni,et al.  Modeling Missing Data in Distant Supervision for Information Extraction , 2013, TACL.

[28]  Oren Etzioni,et al.  Learning to Solve Arithmetic Word Problems with Verb Categorization , 2014, EMNLP.

[29]  Daniel S. Weld,et al.  Type-Aware Distantly Supervised Relation Extraction with Linked Arguments , 2014, EMNLP.