Exploiting temporal information in Web search

Time plays important roles in Web search, because most Web pages contain temporal information and a lot of Web queries are time-related. How to integrate temporal information in Web search engines has been a research focus in recent years. However, traditional search engines have little support in processing temporal-textual Web queries. Aiming at solving this problem, in this paper, we concentrate on the extraction of the focused time for Web pages, which refers to the most appropriate time associated with Web pages, and then we used focused time to improve the search efficiency for time-sensitive queries. In particular, three critical issues are deeply studied in this paper. The first issue is to extract implicit temporal expressions from Web pages. The second one is to determine the focused time among all the extracted temporal information, and the last issue is to integrate focused time into a search engine. For the first issue, we propose a new dynamic approach to resolve the implicit temporal expressions in Web pages. For the second issue, we present a score model to determine the focused time for Web pages. Our score model takes into account both the frequency of temporal information in Web pages and the containment relationship among temporal information. For the third issue, we combine the textual similarity and the temporal similarity between queries and documents in the ranking process. To evaluate the effectiveness and efficiency of the proposed approaches, we build a prototype system called Time-Aware Search Engine (TASE). TASE is able to extract both the explicit and implicit temporal expressions for Web pages, and calculate the relevant score between Web pages and each temporal expression, and re-rank search results based on the temporal-textual relevance between Web pages and queries. Finally, we conduct experiments on real data sets. The results show that our approach has high accuracy in resolving implicit temporal expressions and extracting focused time, and has better ranking effectiveness for time-sensitive Web queries than its competitor algorithms.

[1]  B. M. Sundheim,et al.  Named entity task definition, version 2.1 , 1995 .

[2]  Peiquan Jin,et al.  Automatic Temporal Expression Normalization with Reference Time Dynamic-Choosing , 2010, COLING.

[3]  Marc Moens,et al.  Algorithms for Analysing the Temporal Structure of Discourse , 1995, EACL.

[4]  Kjetil Nørvåg,et al.  A comparison of time-aware ranking methods , 2011, SIGIR '11.

[5]  Kam-Fai Wong,et al.  Applying Machine Learning to Chinese Temporal Relation Resolution , 2004, ACL.

[6]  Wenjie Li,et al.  Normalizing Chinese temporal expressions with multi-label classification , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[7]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[8]  Inderjeet Mani,et al.  Robust Temporal Processing of News , 2000, ACL.

[9]  Fuchun Peng,et al.  Improving search relevance for implicitly temporal queries , 2009, SIGIR.

[10]  Alex Lascarides,et al.  Proceedings of the 32nd annual meeting on Association for Computational Linguistics , 1992 .

[11]  Bonnie J. Dorr,et al.  Constraints on the Generation of Tense, Aspect, and Connecting Words from Temporal Expressions , 2002 .

[12]  Kam-Fai Wong,et al.  An Overview of Temporal Information Extraction , 2005, Int. J. Comput. Process. Orient. Lang..

[13]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[14]  Peiquan Jin,et al.  Extracting Focused Locations for Web Pages , 2011, WAIM Workshops.

[15]  Nikolai Vazov A System For Extraction Of Temporal Expressions From French Texts Based On Syntactic And Semantic Constraints , 2001, ACL 2001.

[16]  Luis Gravano,et al.  Answering General Time-Sensitive Queries , 2012, IEEE Trans. Knowl. Data Eng..

[17]  Hong Chen,et al.  NTLM: A Time-Enhanced Language Model Based Ranking Approach for Web Search , 2010, WISE Workshops.

[18]  Kjetil Nørvåg,et al.  Determining Time of Queries for Re-ranking Search Results , 2010, ECDL.

[19]  Michael Gertz,et al.  On the value of temporal information in information retrieval , 2007, SIGF.

[20]  Cristina Ribeiro,et al.  Use of Temporal Expressions in Web Search , 2008, ECIR.

[21]  Inderjeet Mani,et al.  Automatic TIMEX2 tagging of Korean news , 2004, TALIP.

[22]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[23]  Hong Chen,et al.  CT-Rank: A Time-aware Ranking Algorithm for Web Search , 2010, J. Convergence Inf. Technol..

[24]  Gerhard Weikum,et al.  A Language Modeling Approach for Temporal Information Needs , 2010, ECIR.

[25]  Fernando Diaz,et al.  Using temporal profiles of queries for precision prediction , 2004, SIGIR '04.

[26]  James Pustejovsky,et al.  Temporal Processing with the TARSQI Toolkit , 2008, COLING.

[27]  Yuan Chunfa Automatic TIMEX2 tagging of Chinese temporal information , 2008 .