Recognizing Identical Events with Graph Kernels

Identifying news stories that discuss the same real-world events is important for news tracking and retrieval. Most existing approaches rely on the traditional vector space model. We propose an approach for recognizing identical real-world events based on a structured, event-oriented document representation. We structure documents as graphs of event mentions and use graph kernels to measure the similarity between document pairs. Our experiments indicate that the proposed graph-based approach can outperform the traditional vector space model, and is especially suitable for distinguishing between topically similar, yet non- identical events.

[1]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[2]  Tatsuya Akutsu,et al.  Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines , 2005, J. Chem. Inf. Model..

[3]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[4]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[5]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[6]  Paolo Frasconi,et al.  Weighted decomposition kernels , 2005, ICML.

[7]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.

[8]  Tonya D. Horton An Introduction to the Euro , 1999 .

[9]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[10]  Sanda M. Harabagiu,et al.  A Linguistic Resource for Discovering Event Structures and Resolving Event Coreference , 2008, LREC.

[11]  Bruno Pouliquen,et al.  An introduction to the Europe Media Monitor family of applications , 2013, ArXiv.

[12]  Helena Ahonen-Myka,et al.  Simple Semantics in Topic Detection and Tracking , 2004, Information Retrieval.

[13]  M. Hepple,et al.  SemEval-2007 Task 15: TempEval Temporal Relation Identification , 2007, *SEMEVAL.

[14]  James Pustejovsky,et al.  SemEval-2007 Task 15: TempEval Temporal Relation Identification , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[15]  Jan Snajder,et al.  Exploring Coreference Uncertainty of Generically Extracted Event Mentions , 2013, CICLing.

[16]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[17]  Yiming Yang,et al.  Learning approaches for detecting and tracking news events , 1999, IEEE Intell. Syst..

[18]  Luis Gravano,et al.  An investigation of linguistic features and clustering algorithms for topical document clustering , 2000, SIGIR '00.

[19]  Steven Bethard,et al.  Finding event, temporal and causal structure in text: a machine learning approach , 2007 .

[20]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[21]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[22]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[23]  Heeyoung Lee,et al.  Joint Entity and Event Coreference Resolution across Documents , 2012, EMNLP.

[24]  Erik Van der Goot,et al.  Near real time information mining in multilingual news , 2009, WWW '09.

[25]  Sanda M. Harabagiu,et al.  Unsupervised Event Coreference Resolution with Rich Linguistic Features , 2010, ACL.

[26]  Charles L. Wayne Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation , 2000, LREC.

[27]  James Allan,et al.  Using Names and Topics for New Event Detection , 2005, HLT/EMNLP.