Patterns of query reformulation during Web searching

Query reformulation is a key user behavior during Web search. Our research goal is to develop predictive models of query reformulation during Web searching. This article reports results from a study in which we automatically classified the query-reformulation patterns for 964,780 Web searching sessions, composed of 1,523,072 queries, to predict the next query reformulation. We employed an n-gram modeling approach to describe the probability of users transitioning from one query-reformulation state to another to predict their next state. We developed first-, second-, third-, and fourth-order models and evaluated each model for accuracy of prediction, coverage of the dataset, and complexity of the possible pattern set. The results show that Reformulation and Assistance account for approximately 45p of all query reformulations; furthermore, the results demonstrate that the first- and second-order models provide the best predictability, between 28 and 40p overall and higher than 70p for some patterns. Implications are that the n-gram approach can be used for improving searching systems and searching assistance. © 2009 Wiley Periodicals, Inc.

[1]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[2]  Christopher C. Yang,et al.  Mining related queries from Web search engine query logs using an improved association rule mining model , 2007 .

[3]  P. Greenwood,et al.  A Guide to Chi-Squared Testing , 1996 .

[4]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[5]  Amanda Spink,et al.  Defining a session on Web search engines , 2007, J. Assoc. Inf. Sci. Technol..

[6]  Jaswinder Pal Singh,et al.  Predicting category accesses for a user in a structured information space , 2002, SIGIR '02.

[7]  Iris Xie,et al.  Understanding help seeking within the context of searching digital libraries , 2009 .

[8]  Qiang Yang,et al.  A prediction system for multimedia pre-fetching in Internet , 2000, ACM Multimedia.

[9]  Nivio Ziviani,et al.  Using association rules to discover search engines related queries , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[10]  Michael D. Cooper,et al.  Stochastic modeling of usage patterns in a web-based information system , 2002, J. Assoc. Inf. Sci. Technol..

[11]  Bernard J. Jansen,et al.  Evaluating the effectiveness of and patterns of interactions with automated searching assistance , 2005, J. Assoc. Inf. Sci. Technol..

[12]  Douglas W. Oard,et al.  Modeling Information Content Using Observable Behavior , 2001 .

[13]  Brian P. Bailey,et al.  If not now, when?: the effects of interruption at different moments within task execution , 2004, CHI.

[14]  John Tolle Monitoring and Evaluation of Information Systems Via Transaction Log Analysis , 1984, SIGIR.

[15]  Amanda Spink Study of interactive feedback during mediated information retrieval , 1997 .

[16]  Liwen Qiu,et al.  Markov Models of Search State Patterns in a Hypertext Information Retrieval System , 1993, J. Am. Soc. Inf. Sci..

[17]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[18]  Eric Horvitz,et al.  Patterns of search: analyzing and modeling Web query refinement , 1999 .

[19]  Susan Gauch,et al.  An expert system for automatic query reformation , 1993 .

[20]  Charles T. Meadow,et al.  A Computer Intermediary for Interactive Database Searching. I. Design , 2007, J. Am. Soc. Inf. Sci..

[21]  Susan T. Dumais,et al.  Analysis of topic dynamics in web search , 2005, WWW '05.

[22]  D. Wolfram Term co-occurrence in Internet queries : An analysis of the Excite data base , 1999 .

[23]  Bernard J. Jansen Using temporal patterns of interactions to design effective automated searching assistance , 2006, CACM.

[24]  Peiling Wang,et al.  Mining longitudinal web queries: Trends and patterns , 2003, J. Assoc. Inf. Sci. Technol..

[25]  H Thompson,et al.  Proceedings of the EACL '99. Ninth Conference of the European Chapter of the Association for Computational Linguistics , 1999 .

[26]  Bernard J. Jansen,et al.  A review of web searching studies and a framework for future research , 2001 .

[27]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[28]  Chun Wei Choo,et al.  A behavioral model of information seeking on the web: preliminary results of a study of how managers and IT specialists use the web , 1998 .

[29]  Michael D. Cooper,et al.  Using clustering techniques to detect usage patterns in a Web-based information system , 2001, J. Assoc. Inf. Sci. Technol..

[30]  Joonho Lee,et al.  End user searching: A Web log analysis of NAVER, a Korean Web search engine , 2005 .

[31]  Gary Marchionini Information-seeking strategies of novices using a full-text electronic encyclopedia , 1989 .

[32]  Charles T. Meadow,et al.  A computer intermediary for interactive database searching. II. Evaluation , 1982, J. Am. Soc. Inf. Sci..

[33]  Alan F. Smeaton,et al.  Personalisation and recommender systems in digital libraries , 2005, International Journal on Digital Libraries.

[34]  Eszter Hargittai,et al.  Beyond logs and surveys: In-depth measures of people's web use skills , 2002, J. Assoc. Inf. Sci. Technol..

[35]  Peter G. Anick Using terminological feedback for web search refinement: a log-based study , 2003, SIGIR.

[36]  Huseyin Cenk Özmutlu,et al.  Application of automatic topic identification on Excite Web search engine data logs , 2005, Inf. Process. Manag..

[37]  Zhiyong Zhang,et al.  Efficient Hybrid Web Recommendations Based on Markov Clickstream Models and Implicit Search , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[38]  Marcia J. Bates,et al.  A profile of end‐user searching behavior by humanities scholars: The Getty Online Searching Project Report No. 2 , 1993 .

[39]  Thorsten Brants,et al.  Cascaded Markov Models , 1999, EACL.

[40]  Ryen W. White,et al.  Exploratory search interfaces: categorization, clustering and beyond: report on the XSI 2005 workshop at the Human-Computer Interaction Laboratory, University of Maryland , 2005, SIGF.

[41]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part I. Background and Theory , 1997, J. Documentation.

[42]  Amanda Spink,et al.  Cross validation of neural network applications for automatic new topic identification , 2005, ASIST.

[43]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[44]  Janet L. Chapman A state transition analysis of online information-seeking behavior , 1981, J. Am. Soc. Inf. Sci..

[45]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[46]  Michael J. Pazzani,et al.  Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.

[47]  Amanda Spink,et al.  Neural network applications for automatic new topic identification on excite web search engine data logs , 2004, ASIST.

[48]  Soo Young Rieh,et al.  Analysis of multiple query reformulations on the web: The interactive information retrieval context , 2006, Information Processing & Management.

[49]  Sehchang Hah,et al.  Online search patterns: NLM CATLINE database , 1985, J. Am. Soc. Inf. Sci..