Leveraging linked entities to estimate focus time of short texts

Time is a useful dimension to explore in text databases especially when historical and factual information is concerned. As documents generally refer to different events and time periods, understanding the focus time of key sentences, defined as the time the content refers to, is a crucial task to temporally annotate a document. In this paper, we leverage a bag of linked entities representation of sentences and temporal information from Wikipedia and DBpedia to implement a novel approach to focus time estimation. We evaluate our approach on sample datasets and compare it with a state of the art method, measuring improvements in MRR.

[1]  Christian Morbidoni,et al.  A Bag-of-entities Approach to Document Focus Time Estimation , 2017, KDWeb.

[2]  Adam Jatowt,et al.  Estimating document focus time , 2013, CIKM.

[3]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[4]  Michael Gertz,et al.  Temporal Information Retrieval , 2009, Encyclopedia of Database Systems.

[5]  Raphaël Troncy,et al.  Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web , 2014, LREC.

[6]  Andreas Spitz,et al.  Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events , 2016, SIGIR.

[7]  Ricardo Campos,et al.  Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[8]  Ricardo Campos,et al.  Survey of Temporal Information Retrieval and Related Applications , 2014, ACM Comput. Surv..

[9]  Klaus Berberich,et al.  Estimating Event Focus Time Using Neural Word Embeddings , 2017, CIKM.

[10]  Michele Barbera,et al.  DataTXT at #Microposts2014 Challenge , 2014, #MSM.

[11]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[12]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[13]  Michael Gertz,et al.  Identification of top relevant temporal expressions in documents , 2012, TempWeb '12.

[14]  Adam Jatowt,et al.  Generic method for detecting focus time of documents , 2015, Inf. Process. Manag..

[15]  Andreas Spitz,et al.  EVELIN: Exploration of Event and Entity Links in Implicit Networks , 2017, WWW.

[16]  Ricardo Baeza-Yates,et al.  Clustering and exploring search results using timeline constructions , 2009, CIKM.