Mining the search trails of surfing crowds: identifying relevant websites from user activity

The paper proposes identifying relevant information sources from the history of combined searching and browsing behavior of many Web users. While it has been previously shown that user interactions with search engines can be employed to improve document ranking, browsing behavior that occurs beyond search result pages has been largely overlooked in prior work. The paper demonstrates that users' post-search browsing activity strongly reflects implicit endorsement of visited pages, which allows estimating topical relevance of Web resources by mining large-scale datasets of search trails. We present heuristic and probabilistic algorithms that rely on such datasets for suggesting authoritative websites for search queries. Experimental evaluation shows that exploiting complete post-search browsing trails outperforms alternatives in isolation (e.g., clickthrough logs), and yields accuracy improvements when employed as a feature in learning to rank for Web search.

[1]  Nicholas J. Belkin,et al.  Display time as implicit feedback: understanding task effects , 2004, SIGIR '04.

[2]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[3]  Ryen W. White,et al.  Studying the use of popular destinations to enhance web search interaction , 2007, SIGIR.

[4]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[5]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[6]  Xuehua Shen,et al.  Context-sensitive information retrieval using implicit feedback , 2005, SIGIR '05.

[7]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[8]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[9]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[10]  Wei-Ying Ma,et al.  Optimizing web search using web click-through data , 2004, CIKM '04.

[11]  D. Metcalf On Relevance , 1999, Stem cells.

[12]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[13]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[14]  Eric Brill,et al.  Beyond PageRank: machine learning for static ranking , 2006, WWW '06.

[15]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[16]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[17]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[18]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[19]  Soumen Chakrabarti,et al.  Learning to rank networked entities , 2006, KDD '06.

[20]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[21]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[22]  Ian Soboroff,et al.  Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.

[23]  ChengXiang Zhai,et al.  Mining long-term search history to improve search accuracy , 2006, KDD '06.

[24]  Christopher J. C. Burges,et al.  High accuracy retrieval with multiple nested ranker , 2006, SIGIR.

[25]  S. Robertson The probability ranking principle in IR , 1997 .

[26]  Volker Tresp Proceedings of the NIPS 2005 Workshop on Learning to Rank , 2005 .

[27]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[28]  Christopher Olston,et al.  Navigationaided retrieval , 2007, WWW '07.

[29]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[30]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[31]  Robert W. Reeder,et al.  Information scent as a driver of Web behavior graphs: results of a protocol analysis method for Web usability , 2001, CHI.

[32]  Ryen W. White,et al.  WWW 2007 / Track: Browsers and User Interfaces Session: Personalization Investigating Behavioral Variability in Web Search , 2022 .

[33]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[34]  Ryen W. White,et al.  Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes , 2002, SIGIR '02.

[35]  Eugene Agichtein,et al.  Identifying "best bet" web search results by mining past user behavior , 2006, KDD '06.

[36]  Mark S. Ackerman,et al.  The perfect search engine is not enough: a study of orienteering behavior in directed search , 2004, CHI.

[37]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[38]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[39]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[40]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[41]  S. Brereton Life , 1876, The Indian medical gazette.

[42]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[43]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[44]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[45]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[46]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[47]  Peter Pirolli,et al.  Life, death, and lawfulness on the electronic frontier , 1997, CHI.

[48]  Matthew Chalmers,et al.  The Order of Things: Activity-Centred Information Access, , 1998, Comput. Networks.

[49]  Pattie Maes,et al.  Footprints: history-rich tools for information foraging , 1999, CHI '99.

[50]  Journal of the Association for Computing Machinery , 1961, Nature.