Temporal specificity-based text classification for information retrieval

Time is an important aspect in temporal information retrieval (TIR), a subfield of information retrieval (IR). Web search engines like Google or Bing are common examples of IR systems. An important constituent of a search engine is news retrieval, where users present their information needs in the form of temporal queries. Users are usually interested in news documents focusing on a particular time period. Existing search engines rarely fulfill the temporal information requirements as they ignore the temporal information available in the content of news documents, also known as document focus time. Furthermore, information related to multiple time periods in a news document makes the identification of document focus time a challenging task. Therefore, it is necessary to classify news documents based on temporal specificity before it is possible to use the temporal information in the retrieval process. In this study, we formulate the temporal specificity problem as a time-based classification task by classifying news documents into three temporal classes, i.e. high temporal specificity, medium temporal specificity, and low temporal specificity. For such classification, rule-based and temporal specificity score (TSS)-based classification approaches are proposed. In the former approach, news documents are classified using a defined set of rules that are based on temporal features. The later approach classifies news documents based on a TSS score using the temporal features. The results of the proposed techniques are compared with four machine learning classification algorithms: Bayes net, support vector machine, random forest, and decision tree. The results show that the proposed rule-based classifier outperforms the four algorithms by achieving 82 % accuracy, whereas TSS classification achieves 77 % accuracy.

[1]  Fuchun Peng,et al.  Improving search relevance for implicitly temporal queries , 2009, SIGIR.

[2]  Adam Jatowt,et al.  Generic method for detecting focus time of documents , 2015, Inf. Process. Manag..

[3]  Zhaohui Zheng,et al.  Learning Recurrent Event Queries for Web Search , 2010, EMNLP.

[4]  Nattiya Kanhabua,et al.  Time-aware approaches to information retrieval , 2012, SIGF.

[5]  Mu ˘ gla-TURKEY Investigation of Luhn's claim on information retrieval , 2011 .

[6]  Michael Gertz,et al.  On the value of temporal information in information retrieval , 2007, SIGF.

[7]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[8]  Andreas Spitz,et al.  Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model , 2015, WWW.

[9]  Kjetil Nørvåg,et al.  Determining Time of Queries for Re-ranking Search Results , 2010, ECDL.

[10]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[11]  Jannik Strötgen Proximity 2-aware Ranking for Textual , Temporal , and Geographic Queries ( extended version ) ∗ , 2013 .

[12]  Goran Nenadic,et al.  Mining temporal footprints from Wikipedia , 2014, COLING 2014.

[13]  Yoshimi Suzuki,et al.  Temporal-based feature selection and transfer learning for text categorization , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[14]  Sanja Stajner,et al.  Stylistic Changes for Temporal Text Classification , 2013, TSD.

[15]  Adam Jatowt,et al.  Estimating document focus time , 2013, CIKM.

[16]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[17]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[18]  Liviu P. Dinu,et al.  Temporal Text Ranking and Automatic Dating of Texts , 2014, EACL.

[19]  Michael Gertz,et al.  Temporal Information Retrieval , 2009, Encyclopedia of Database Systems.

[20]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[21]  Luis Gravano,et al.  Answering General Time-Sensitive Queries , 2012, IEEE Trans. Knowl. Data Eng..

[22]  Miguel Costa,et al.  Learning temporal-dependent ranking models , 2014, SIGIR.

[23]  A. Nur Zincir-Heywood,et al.  Analyzing the Temporal Sequences for Text Categorization , 2004, KES.

[24]  Kjetil Nørvåg,et al.  Using Temporal Language Models for Document Dating , 2009, ECML/PKDD.