Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

The rise of “Big Data” analytics over unstructured text has led to renewed interest in information extraction (IE). We surveyed the landscape of IE technologies and identified a major disconnect between industry and academia: while rule-based IE dominates the commercial world, it is widely regarded as dead-end technology by the academia. We believe the disconnect stems from the way in which the two communities measure the benefits and costs of IE, as well as academia’s perception that rulebased IE is devoid of research challenges. We make a case for the importance of rule-based IE to industry practitioners. We then lay out a research agenda in advancing the state-of-theart in rule-based IE systems which we believe has the potential to bridge the gap between academic research and industry practice.

[1]  Douglas E. Appelt,et al.  The Common Pattern Specification Language , 1998, TIPSTER.

[2]  Soumen Chakrabarti,et al.  Enhanced Answer Type Inference from Questions using Sequential Models , 2005, HLT/EMNLP.

[3]  Jun'ichi Tsujii,et al.  Automatic Construction of Predicate-argument Structure Patterns for Biomedical Information Extraction , 2006, EMNLP.

[4]  Ronen Feldman,et al.  Boosting Unsupervised Relation Extraction by Using NER , 2006, EMNLP.

[5]  Li Zhang,et al.  Empirical Study on the Performance Stability of Named Entity Recognition Model across Domains , 2006, EMNLP.

[6]  Jeffrey F. Naughton,et al.  Declarative Information Extraction Using Datalog with Embedded Extraction Predicates , 2007, VLDB.

[7]  Frank Puppe,et al.  Rule-Based Information Extraction for Structured Data Acquisition using TextMarker , 2008, LWA.

[8]  Luis Gravano,et al.  Building query optimizers for information extraction: the SQoUT project , 2009, SGMD.

[9]  F. Puppe,et al.  TextMarker : A Tool for Rule-Based Information Extraction , 2009 .

[10]  Frederick Reiss,et al.  SystemT: An Algebraic Approach to Declarative Information Extraction , 2010, ACL.

[11]  Daisy Zhe Wang,et al.  Probabilistic declarative information extraction , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[12]  Cane Wing-ki Leung,et al.  Unsupervised Information Extraction with Distributional Prior Knowledge , 2011, EMNLP.

[13]  Frederick Reiss,et al.  SystemT: A Declarative Information Extraction System , 2011, ACL.

[14]  Junling Hu,et al.  Bootstrapped Named Entity Recognition for Product Attribute Extraction , 2011, EMNLP.

[15]  Andrew McCallum,et al.  Structured Relation Discovery using Generative Models , 2011, EMNLP.

[16]  Kiri Wagstaff,et al.  Machine Learning that Matters , 2012, ICML.

[17]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[18]  Frederick Reiss,et al.  Spanners: a formal framework for information extraction , 2013, PODS '13.

[19]  Kush R. Varshney,et al.  Exact Rule Learning via Boolean Compressed Sensing , 2013, ICML.