Evaluating Automatic Learning of Structure for Event Extraction

Analysts engaged in monitoring and forecasting benefit from the structured representations of domain knowledge and societal events that allow for the use of advanced analytics and predictive data models over large amounts of temporally extended data. However, extracting structured data from unstructured data typically requires the development of domain specific software which is costly, takes months to years to create, and cannot adapt to changing domains. In this paper we consider the operational usefulness of an approach pioneered by Chambers and Jurafsky (Template-based information extraction without the templates, 2011, [1]) that performs automatic learning of structured domain knowledge in the form of event templates from unstructured text that are used to automatically extract structured events from text. We generalize this approach and apply it to operationally relevant corpora from Brazil, Mexico, Ukraine, and Pakistan that focus on societal protests and providing aid. We discover that we are able to generate compelling event templates that correspond to event types described by Conflict and Mediation Event Observations (CAMEO) codes (Retrieved from Computational Event Data System, 2014, [2]) which are used to label event types by existing state of the art systems. Additionally, we are able to learn event templates that capture more nuance than the CAMEO codes represent, as well as entirely new and interesting event types. To automate our experimentation, we describe novel automated metrics that allow us to batch run multiple experiments while getting automated feedback on the quality of results from each run. These metrics indicate significant overlap between the events we extract and those extracted by existing systems.

[1]  Romaric Besançon,et al.  Generative Event Schema Induction with Entity Disambiguation , 2015, ACL.

[2]  Francis Ferraro,et al.  Script Induction as Language Modeling , 2015, EMNLP.

[3]  Ellen Riloff,et al.  Exploiting Subjectivity Classification to Improve Information Extraction , 2005, AAAI.

[4]  Nathanael Chambers,et al.  Template-Based Information Extraction without the Templates , 2011, ACL.

[5]  Jackie Chi Kit Cheung,et al.  Probabilistic Frame Induction , 2013, NAACL.

[6]  Razvan C. Bunescu,et al.  Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[7]  Siddharth Patwardhan,et al.  A Unified Model of Phrasal and Sentential Evidence for Information Extraction , 2009, EMNLP.

[8]  Siddharth Patwardhan,et al.  Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions , 2007, EMNLP.

[9]  Marie-Francine Moens,et al.  Skip N-grams and Ranking Functions for Predicting Script Events , 2012, EACL.

[10]  Nathanael Chambers,et al.  Event Schema Induction with a Probabilistic Entity-Driven Model , 2013, EMNLP.

[11]  Ralph Grishman,et al.  An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition , 2003, ACL.

[12]  Raymond J. Mooney,et al.  Statistical Script Learning with Multi-Argument Events , 2014, EACL.

[13]  Vasileios Hatzivassiloglou,et al.  Automatic Creation of Domain Templates , 2006, ACL.

[14]  Dayne Freitag,et al.  Toward General-Purpose Learning for Information Extraction , 1998, ACL.

[15]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[16]  Ellen Riloff,et al.  Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts , 2011, ACL.

[17]  Ellen Riloff,et al.  An Empirical Approach to Conceptual Case Frame Acquisition , 1998, VLC@COLING/ACL.

[18]  Regina Barzilay,et al.  In-domain Relation Discovery with Meta-constraints via Posterior Regularization , 2011, ACL.

[19]  Lynette Hirschman,et al.  Evaluating Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3) , 1993, CL.

[20]  Hwee Tou Ng,et al.  Closing the Gap: Learning-Based Information Extraction Rivaling Knowledge-Engineering Methods , 2003, ACL.

[21]  Katrin Erk,et al.  Exemplar-Based Models for Word Meaning in Context , 2010, ACL.

[22]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Event Chains , 2008, ACL.

[23]  Oren Etzioni,et al.  Generating Coherent Event Schemas at Scale , 2013, EMNLP.