Important Events in the Past, Present, and Future

We address the problem of identifying important events in the past, present, and future from semantically-annotated large-scale document collections. Semantic annotations that we consider are named entities (e.g., persons, locations, organizations) and temporal expressions (e.g., during the 1990s). More specifically, for a given time period of interest, our objective is to identify, rank, and describe important events that happened. Our approach P2F Miner makes use of frequent itemset mining to identify events and group sentences related to them. It uses an information-theoretic measure to rank identified events. For each of them, it selects a representative sentence as a description. Experiments on ClueWeb09 using events listed in Wikipedia year articles as ground truth show that our approach is effective and outperforms a baseline based on statistical language models.

[1]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[2]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval: A Critical Review , 2008, Found. Trends Inf. Retr..

[3]  Gerhard Weikum,et al.  STICS: searching with strings, things, and cats , 2014, SIGIR.

[4]  Clive Loughlin Researching the future , 2009 .

[5]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[6]  Gerhard Weikum,et al.  A Language Modeling Approach for Temporal Information Needs , 2010, ECIR.

[7]  Torsten Suel,et al.  Improved index compression techniques for versioned document collections , 2010, CIKM '10.

[8]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[9]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[10]  James Allan,et al.  Introduction to topic detection and tracking , 2002 .

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Gerhard Weikum,et al.  Time-Based Exploration of News Archives , 2010 .

[13]  Mark Liberman,et al.  Corpora for topic detection and tracking , 2002 .

[14]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[15]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[17]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[18]  M. de Rijke,et al.  Using temporal bursts for query modeling , 2014, Information Retrieval.

[19]  Michael Gertz,et al.  On the value of temporal information in information retrieval , 2007, SIGF.

[20]  James Pustejovsky,et al.  Automating Temporal Annotation with TARSQI , 2005, ACL.

[21]  R. Baeza-Yates Searching the Future , 2022 .

[22]  Adam Jatowt,et al.  Extracting collective expectations about the future from large text collections , 2011, CIKM '11.

[23]  Irem Arikan,et al.  Time Will Tell: Leveraging Temporal Expressions in IR , 2009, WSDM.

[24]  Tom M. Mitchell,et al.  Coupled temporal scoping of relational facts , 2012, WSDM '12.

[25]  Gerhard Weikum,et al.  A Fresh Look on Knowledge Bases: Distilling Named Events from News , 2014, CIKM.

[26]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[27]  Gerhard Weikum,et al.  Coupling Label Propagation and Constraints for Temporal Fact Extraction , 2012, ACL.

[28]  Srikanta J. Bedathur,et al.  Index maintenance for time-travel text search , 2012, SIGIR '12.

[29]  Gerhard Weikum,et al.  Efficient Time-Travel on Versioned Text Collections , 2007, BTW.

[30]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[31]  Adam Jatowt,et al.  Studying how the past is remembered: towards computational history through large scale text mining , 2011, CIKM '11.

[32]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[33]  James Allan,et al.  Entity query feature expansion using knowledge base links , 2014, SIGIR.

[34]  David Jensen,et al.  TimeMines: Constructing Timelines with Statistical Models of Word Usage , 2000, KDD 2000.

[35]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.