Adding the temporal dimension to search - a case study in publication search

The most well known search techniques are perhaps the PageRank and HITS algorithms. In this paper, we argue that these algorithms miss an important dimension, the temporal dimension. Quality pages in the past may not be quality pages now or in the future. These techniques favor older pages because these pages have many in-links accumulated over time. New pages, which may be of high quality, have few or no in-links and are left behind. Research publication search has the same problem. If we use the PageRank or HITS algorithm, those older or classic papers are ranked high due to the large number of citations that they received in the past. This paper studies the temporal dimension of search in the context of research publication. A number of methods are proposed to deal with the problem based on analyzing the behavior history and the source of each publication. These methods are evaluated empirically. Our results show that they are highly effective.

[1]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[4]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[5]  C. Lee Giles,et al.  Indexing and retrieval of scientific literature , 1999, CIKM '99.

[6]  Marco Gori,et al.  Web page scoring systems for horizontal and vertical search , 2002, WWW.

[7]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[8]  Junghoo Cho,et al.  Impact of search engines on page popularity , 2004, WWW '04.

[9]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[10]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[11]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[12]  Ronald Fagin,et al.  Searching the workplace web , 2003, WWW '03.

[13]  Ricardo A. Baeza-Yates,et al.  Web Structure, Dynamics and Page Quality , 2002, SPIRE.

[14]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[15]  Amos Fiat,et al.  Web search via hub synthesis , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.