Temporal Text Ranking and Automatic Dating of Texts

This paper presents a novel approach to the task of temporal text classification combining text ranking and probability for the automatic dating of historical texts. The method was applied to three historical corpora: an English, a Portuguese and a Romanian corpus. It obtained performance ranging from 83% to 93% accuracy, using a fully automated approach with very basic features.

[1]  Djoerd Hiemstra,et al.  An exploration of language identification techniques for the Dutch folktale database , 2012 .

[2]  Shusaku Tsumoto,et al.  Text Categorization with Considering Temporal Patterns of Term Usages , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[3]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[4]  Djoerd Hiemstra,et al.  Temporal Language Models for the Disclosure of Historical Text , 2005 .

[5]  Yorick Wilks,et al.  Automatic Dating of Documents and Temporal Text Classification , 2006 .

[6]  David Denison A corpus of late Modern English prose , 1994 .

[7]  Matthew Lease,et al.  Supervised language modeling for temporal resolution of texts , 2011, CIKM '11.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Liviu P. Dinu,et al.  Temporal Text Classification for Romanian Novels set in the Past , 2013, RANLP.

[10]  Rada Mihalcea,et al.  Word Epoch Disambiguation: Finding How Words Change Over Time , 2012, ACL.

[11]  Kjetil Nørvåg,et al.  Using Temporal Language Models for Document Dating , 2009, ECML/PKDD.

[12]  Wessel Kraaij,et al.  Variations on language modeling for information retrieval , 2005, SIGF.

[13]  Luis Gravano,et al.  Answering General Time-Sensitive Queries , 2008, IEEE Transactions on Knowledge and Data Engineering.

[14]  Bertrand Thirion,et al.  Learning to rank from medical imaging data , 2012, MLMI.

[15]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16]  Romance Philology Colonia : Corpus of Historical Portuguese , 2013 .

[17]  Sanja Stajner,et al.  Stylistic Changes for Temporal Text Classification , 2013, TSD.