Organizing User Search Histories

Users are increasingly pursuing complex task-oriented goals on the web, such as making travel arrangements, managing finances, or planning purchases. To this end, they usually break down the tasks into a few codependent steps and issue multiple queries around these steps repeatedly over long periods of time. To better support users in their long-term information quests on the web, search engines keep track of their queries and clicks while searching online. In this paper, we study the problem of organizing a user's historical queries into groups in a dynamic and automated fashion. Automatically identifying query groups is helpful for a number of different search engine components and applications, such as query suggestions, result ranking, query alterations, sessionization, and collaborative search. In our approach, we go beyond approaches that rely on textual similarity or time thresholds, and we propose a more robust approach that leverages search query logs. We experimentally study the performance of different techniques, and showcase their potential, especially when combined together.

[1]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[2]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[3]  Amanda Spink,et al.  Defining a session on Web search engines , 2007, J. Assoc. Inf. Sci. Technol..

[4]  Eric Horvitz,et al.  Patterns of search: analyzing and modeling Web query refinement , 1999 .

[5]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[6]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[7]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[8]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[9]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[10]  Ariel Fuxman,et al.  Using the wisdom of the crowds for keyword generation , 2008, WWW.

[11]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[12]  Konstantin Avrachenkov,et al.  Monte Carlo Methods in PageRank Computation: When One Iteration is Sufficient , 2007, SIAM J. Numer. Anal..

[13]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[14]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[15]  Fernando Diaz,et al.  Temporal profiles of queries , 2007, TOIS.

[16]  Chris P. Tsokos,et al.  Mathematical Statistics with Applications , 2009 .

[17]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[18]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[19]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[20]  Amanda Spink,et al.  Defining a session on Web search engines: Research Articles , 2007 .

[21]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[22]  Ricardo A. Baeza-Yates,et al.  Graphs from Search Engine Queries , 2007, SOFSEM.

[23]  Tadeusz Radecki Output ranking methodology for document-clustering-based Boolean retrieval systems , 1985, SIGIR '85.

[24]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[25]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[26]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[27]  Amanda Spink,et al.  Multitasking during Web search sessions , 2006, Inf. Process. Manag..

[28]  Colin Fyfe,et al.  Online Clustering Algorithms , 2008, Int. J. Neural Syst..

[29]  Jaime Teevan,et al.  Information re-retrieval: repeat queries in Yahoo's logs , 2007, SIGIR.

[30]  Michael W. Berry,et al.  Lecture Notes in Data Mining , 2006 .

[31]  Peter G. Anick Using terminological feedback for web search refinement: a log-based study , 2003, SIGIR.

[32]  Sebastiano Vigna,et al.  PageRank as a function of the damping factor , 2005, WWW '05.

[33]  Christos Faloutsos,et al.  Identifying Web Browsing Trends and Patterns , 2001, Computer.

[34]  Farzin Maghoul,et al.  Query clustering using click-through graph , 2009, WWW '09.

[35]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[36]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[37]  Huseyin Cenk Özmutlu,et al.  Application of automatic topic identification on Excite Web search engine data logs , 2005, Inf. Process. Manag..