Cluster-Centric Approach to News Event Extraction

This paper presents a real-time and multilingual news event extraction system developed at the Joint Research Centre of the European Commission. It is capable of accurately and efficiently extracting violent and natural disaster events from online news. In particular, a linguistically relatively lightweight approach is deployed, in which clustered news are heavily exploited at all stages of processing. The paper focuses on the system's architecture, real-time news clustering, geolocating clusters, event extraction grammar development, adapting the system to the processing of new languages, cluster-level information fusion, visual event tracking and accuracy evaluation.

[1]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[2]  Ellen Riloff Bootstrapping for text learning tasks , 1999 .

[3]  Mila Ramos-Santacruz,et al.  REES: A Large-Scale Relation and Event Extraction System , 2000, ANLP.

[4]  Diana Maynard,et al.  JAPE: a Java Annotation Patterns Engine , 2000 .

[5]  Ralph Grishman,et al.  Real-time event extraction for infectious disease outbreaks , 2002 .

[6]  Atanas Kiryakov,et al.  Towards Semantic Web Information Extraction , 2003 .

[7]  Gary King,et al.  An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design , 2003, International Organization.

[8]  Roman Yangarber,et al.  Counter-Training in Discovery of Semantic Patterns , 2003, ACL.

[9]  Ulrich Schäfer,et al.  Shallow Processing with Unification and Typed Feature Structures - Foundations and Applications , 2004, Künstliche Intell..

[10]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[11]  Bruno Pouliquen,et al.  Geocoding Multilingual Texts: Recognition, Disambiguation and Visualisation , 2006, LREC.

[12]  Piskorski Jakub,et al.  ExPRESS - Extraction Pattern Recognition Engine and Specification Suite , 2007 .

[13]  Jakub Piskorski,et al.  Event Extraction for Italian Using a Cascade of Finite-State Grammars , 2009, FSMNLP.

[14]  Jakub Piskorski,et al.  Real-Time News Event Extraction for Global Crisis Monitoring , 2008, NLDB.

[15]  Jakub Piskorski,et al.  CORLEONE Core Linguistic Entity Online Extraction , 2008 .