Supervised language modeling for temporal resolution of texts

We investigate temporal resolution of documents, such as determining the date of publication of a story based on its text. We describe and evaluate a model that build histograms encoding the probability of different temporal periods for a document. We construct histograms based on the Kullback-Leibler Divergence between the language model for a test document and supervised language models for each interval. Initial results indicate this language modeling approach is effective for predicting the dates of publication of short stories, which contain few explicit mentions of years.

[1]  Geoffrey Andogah,et al.  Geographically constrained information retrieval , 2011 .

[2]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[3]  Djoerd Hiemstra,et al.  Temporal Language Models for the Disclosure of Historical Text , 2005 .

[4]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[5]  Robert Dale,et al.  WikiWars: A New Corpus for Research on Temporal Expressions , 2010, EMNLP.

[6]  References , 1971 .

[7]  Kjetil Nørvåg,et al.  Improving Temporal Language Models for Determining Time of Non-timestamped Documents , 2008, ECDL.

[8]  Massimo Poesio,et al.  Strudel: A Corpus-Based Semantic Model Based on Properties and Types , 2010, Cogn. Sci..

[9]  Luis Gravano,et al.  Answering General Time-Sensitive Queries , 2012, IEEE Trans. Knowl. Data Eng..

[10]  Susan T. Dumais,et al.  Understanding temporal query dynamics , 2011, WSDM '11.

[11]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[12]  Kjetil Nørvåg,et al.  Determining Time of Queries for Re-ranking Search Results , 2010, ECDL.

[13]  Yun Chi,et al.  Structural and temporal analysis of the blogosphere through community factorization , 2007, KDD '07.

[14]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[15]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[16]  Fernando Diaz,et al.  Regularizing query-based retrieval scores , 2007, Information Retrieval.

[17]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[18]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[19]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[20]  Jianwen Zhang,et al.  Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora , 2010, KDD.

[21]  Katrin Erk,et al.  Connecting language and geography with region-topic models , 2010 .

[22]  Justin Zobel,et al.  Using Relative Entropy for Authorship Attribution , 2006, AIRS.

[23]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[24]  Marc B. Vilain,et al.  A System for Reasoning About Time , 1982, AAAI.

[25]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[26]  Ricardo Baeza-Yates,et al.  Clustering and exploring search results using timeline constructions , 2009, CIKM.

[27]  Nathanael Chambers,et al.  Jointly Combining Implicit Constraints Improves Temporal Ordering , 2008, EMNLP.