SystemT: A Declarative Information Extraction System

Emerging text-intensive enterprise applications such as social analytics and semantic search pose new challenges of scalability and usability to Information Extraction (IE) systems. This paper presents SystemT, a declarative IE system that addresses these challenges and has been deployed in a wide range of enterprise applications. SystemT facilitates the development of high quality complex annotators by providing a highly expressive language and an advanced development environment. It also includes a cost-based optimizer and a high-performance, flexible runtime with minimum memory footprint. We present SystemT as a useful resource that is freely available, and as an opportunity to promote research in building scalable and usable IE systems.

[1]  Cong Yu,et al.  Purple SOX extraction management system , 2009, SGMD.

[2]  Frederick Reiss,et al.  Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks , 2010, EMNLP.

[3]  Branimir Boguraev,et al.  Annotation-based finite state processing in a large-scale NLP arhitecture , 2003, RANLP.

[4]  Frederick Reiss,et al.  Enterprise information extraction: recent developments and open challenges , 2010, SIGMOD Conference.

[5]  Luis Gravano,et al.  Building query optimizers for information extraction: the SQoUT project , 2009, SGMD.

[6]  Douglas E. Appelt,et al.  The Common Pattern Specification Language , 1998, TIPSTER.

[7]  Diana Maynard,et al.  JAPE: a Java Annotation Patterns Engine , 2000 .

[8]  Frederick Reiss,et al.  SystemT: An Algebraic Approach to Declarative Information Extraction , 2010, ACL.

[9]  Frederick Reiss,et al.  SystemT: a system for declarative information extraction , 2009, SGMD.

[10]  Daisy Zhe Wang,et al.  Probabilistic declarative information extraction , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[11]  Frederick Reiss,et al.  An Algebraic Approach to Rule-Based Information Extraction , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Jeffrey F. Naughton,et al.  Information extraction challenges in managing unstructured data , 2009, SGMD.

[13]  Raghu Ramakrishnan,et al.  Managing information extraction: state of the art and research directions , 2006, SIGMOD Conference.