Query-Log Based Authority Analysis for Web Information Search

The ongoing explosion of web information calls for more intelligent and personalized methods towards better search result quality for advanced queries. Query logs and click streams obtained from web browsers or search engines can contribute to better quality by exploiting the collaborative recommendations that are implicitly embedded in this information. This paper presents a new method that incorporates the notion of query nodes into the PageRank model and integrates the implicit relevance feedback given by click streams into the automated process of authority analysis. This approach generalizes the well-known random-surfer model into a random-expert model that mimics the behavior of an expert user in an extended session consisting of queries, query refinements, and result-navigation steps. The enhanced PageRank scores, coined QRank scores, can be computed offline; at query-time they are combined with query-specific relevance measures with virtually no overhead. Our preliminary experiments, based on real-life query-log and click-stream traces from eight different trial users indicate significant improvements in the precision of search results.

[1]  Gerhard Weikum,et al.  Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Gerhard Weikum,et al.  Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data , 2003, WebDB.

[4]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[5]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[6]  Jaideep Srivastava,et al.  WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles , 2003, Lecture Notes in Computer Science.

[7]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[8]  Weblog Wikipedia,et al.  In Wikipedia the Free Encyclopedia , 2005 .

[9]  Gerhard Weikum,et al.  Classification and Focused Crawling for Semistructured Data , 2003, Intelligent Search on XML Data.

[10]  Amos Fiat,et al.  Web search via hub synthesis , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  Ji-Rong Wen,et al.  Query Clustering in the Web Context , 2003, Clustering and Information Retrieval.

[13]  Myra Spiliopoulou,et al.  Data Mining for Measuring and Improving the Success of Web Sites , 2004, Data Mining and Knowledge Discovery.

[14]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[15]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[16]  Gerhard Weikum,et al.  The BINGO! System for Information Portal Generation and Expert Web Search , 2003, CIDR.

[17]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[18]  Wei-Ying Ma,et al.  Implicit link analysis for small web search , 2003, SIGIR '03.

[19]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[20]  Chris H. Q. Ding,et al.  PageRank, HITS and a unified framework for link analysis , 2002, SIGIR '02.

[21]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[22]  Arbee L. P. Chen,et al.  Prediction of Web Page Accesses by Proxy Server Log , 2002, World Wide Web.

[23]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[24]  Anna R. Karlin Web Search via Hub Synthesis , 2001, RANDOM-APPROX.

[25]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[26]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[27]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[28]  Serge Abiteboul,et al.  Adaptive on-line page importance computation , 2003, WWW '03.

[29]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[30]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[31]  Donna K. Harman,et al.  The Text REtrieval Conference (TREC) , 1999, NTCIR.

[32]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[33]  Wei-Ying Ma,et al.  Ranking user's relevance to a topic through link analysis on web logs , 2002, WIDM '02.

[34]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[35]  Andrew McCallum,et al.  Building Domain-Specific Search Engines with Machine Learning Techniques , 1999 .

[36]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[37]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[38]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[39]  WuYi-Hung,et al.  Prediction of Web Page Accesses by Proxy Server Log , 2002 .

[40]  Sriram Raghavan,et al.  WebBase: a repository of Web pages , 2000, Comput. Networks.

[41]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[42]  Ronald Fagin,et al.  Searching the workplace web , 2003, WWW '03.

[43]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[44]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[45]  Myra Spiliopoulou,et al.  Analysis of navigation behaviour in web sites integrating multiple information systems , 2000, The VLDB Journal.

[46]  Taher H. Haveliwala Efficient Encodings for Document Ranking Vectors (Extended Abstract) , 2003, International Conference on Internet Computing.

[47]  Arnold O. Allen,et al.  Probability, statistics and queueing theory - with computer science applications (2. ed.) , 1981, Int. CMG Conference.