论文信息 - Automatic Extraction of Events from Open Source Text for Predictive Forecasting

Automatic Extraction of Events from Open Source Text for Predictive Forecasting

Automated analysis of news reports is a significant empowering technology for predictive models of political instability. To date, the standard approach to this analytic task has been embodied in systems such as KEDS/TABARI [1], which use manually-generated rules and shallow parsing techniques to identify events and their participants in text. In this chapter we explore an alternative to event extraction based on BBN SERIFTM, and BBN OnTopicTM, two state-of-the-art statistical natural language processing engines. We empirically compare this new approach to existing event extraction techniques on five dimensions: (1) Accuracy: when an event is reported by the system, how often is it correct? (2) Coverage: how many events are correctly reported by the system? (3) Filtering of historical events: how well are historical events (e.g. 9/11) correctly filtered out of the current event data stream? (4) Topic-based event filtering: how well do systems filter out red herrings based on document topic, such as sports documents mentioning “clashes” between two countries on the playing field? (5) Domain shift: how well do event extraction models perform on data originating from diverse sources? In all dimensions we show significant improvement to the state-of-the-art by applying statistical natural language processing techniques. It is our hope that these results will lead to greater acceptance of automated coding by creators and consumers of social science models that depend on event data and provide a new way to improve the accuracy of those predictive models.

[1] Mark T. Maybury. New Directions in Question Answering , 2004 .

[2] Philip A. Schrodt. AUTOMATED CODING OF INTERNATIONAL EVENT DATA USING SPARSE PARSING TECHNIQUES , 2000 .

[3] Gary King,et al. An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design , 2003, International Organization.

[4] Joseph Olive,et al. Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation , 2011 .

[5] Richard M. Schwartz,et al. Finding structure in noisy text: topic classification and unsupervised clustering , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[6] Philip A. Schrodt,et al. The CAMEO (Conflict and Mediation Event Observations) Actor Coding Framework , 2008 .

[7] Philip A. Schrodt,et al. Conflict and Mediation Event Observations (CAMEO): An event data framework for a post-Cold War world , 2008 .

[8] Philip A. Schrodt,et al. Conflict and Mediation Event Observations (CAMEO): A New Event Data Framework for the Analysis of Foreign Policy Interactions , 2002 .

[9] Sean P. O'Brien,et al. Crisis Early Warning and Decision Support: Contemporary Approaches and Thoughts on Future Research , 2010 .

[10] Richard M. Schwartz,et al. A maximum likelihood model for topic classification of broadcast news , 1997, EUROSPEECH.