Identifying Web search session patterns using cluster analysis: A comparison of three search environments

Session characteristics taken from large transaction logs of three Web search environments (academic Web site, public search engine, consumer health information portal) were modeled using cluster analysis to determine if coherent session groups emerged for each environment and whether the types of session groups are similar across the three environments. The analysis revealed three distinct clusters of session behaviors common to each environment: “hit and run” sessions on focused topics, relatively brief sessions on popular topics, and sustained sessions using obscure terms with greater query modification. The findings also revealed shifts in session characteristics over time for one of the datasets, away from “hit and run” sessions toward more popular search topics. A better understanding of session characteristics can help system designers to develop more responsive systems to support search features that cater to identifiable groups of searchers based on their search behaviors. For example, the system may identify struggling searchers based on session behaviors that match those identified in the current study to provide context sensitive help. © 2009 Wiley Periodicals, Inc.

[1]  Huseyin Cenk Özmutlu,et al.  Application of automatic topic identification on Excite Web search engine data logs , 2005, Inf. Process. Manag..

[2]  Daqing He,et al.  Analysing Web Search Logs to Determine Session Boundaries for User-Oriented Learning , 2000, AH.

[3]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[4]  D. Wolfram Term co-occurrence in Internet queries : An analysis of the Excite data base , 1999 .

[5]  Ophir Frieder,et al.  Temporal analysis of a very large topically categorized Web query log , 2007 .

[6]  Dale Schuurmans,et al.  Dynamic Web log session identification with statistical language models , 2004, J. Assoc. Inf. Sci. Technol..

[7]  Christoph Hölscher How Internet Experts Search For Information On The Web , 1998, WebNet.

[8]  Michael D. Cooper Usage patterns of a Web-based library catalog , 2001 .

[9]  M. Aldenderfer Cluster Analysis , 1984 .

[10]  Peiling Wang,et al.  Mining longitudinal web queries: Trends and patterns , 2003, J. Assoc. Inf. Sci. Technol..

[11]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[12]  Nancy C. M. Ross,et al.  End user searching on the Internet: An analysis of term pair topics submitted to the Excite search engine , 2000, J. Am. Soc. Inf. Sci..

[13]  John McKechnie,et al.  Modelling information seeking behaviour of AEC professionals on online technical information resources , 2003, J. Inf. Technol. Constr..

[14]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[15]  Michael K. Ng,et al.  A Cube Model and Cluster Analysis for Web Access Sessions , 2001, WEBKDD.

[16]  Michael D. Cooper,et al.  Using clustering techniques to detect usage patterns in a Web-based information system , 2001, J. Assoc. Inf. Sci. Technol..

[17]  Jin Zhang,et al.  Visualization of health-subject analysis based on query term co-occurrences , 2008 .

[18]  Jin Zhang,et al.  Mining web search behaviors: Strategies and techniques for data modeling and analysis , 2007, ASIST.

[19]  Amanda Spink,et al.  Defining a session on Web search engines , 2007, J. Assoc. Inf. Sci. Technol..

[20]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[21]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[22]  Jun Li,et al.  A Model Search Engine Based on Cluster Analysis of User Search Terms , 2005 .

[23]  Hua Li,et al.  Demographic prediction based on user's browsing behavior , 2007, WWW '07.

[24]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[25]  J. Novak Learning, Creating, and Using Knowledge , 2009 .

[26]  Jin Zhang,et al.  Modeling Web session behavior using cluster analysis: A comparison of three search settings , 2007, ASIST.

[27]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[28]  Micheal D. Cooper Predicting the relevance of a library catalog search , 2001 .

[29]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[30]  Christos Faloutsos,et al.  Trends and Patterns of WWW Browsing Behavior , 2000 .

[31]  Dietmar Wolfram,et al.  Search characteristics in different types of Web-based IR environments: Are they the same? , 2008, Inf. Process. Manag..

[32]  Jimmy Lin,et al.  Identification of user sessions with hierarchical agglomerative clustering , 2006, ASIST.

[33]  Deborah D. Blecic,et al.  A Longitudinal Study of the Effects of OPAC Screen Changes on Searching Behavior and Searcher Success , 1999 .

[34]  Amanda Spink,et al.  Web Search: Public Searching of the Web , 2011, Information Science and Knowledge Management.

[35]  Jie Li,et al.  Characterizing typical and atypical user sessions in clickstreams , 2008, WWW.

[36]  Ricardo A. Baeza-Yates,et al.  Modeling user search behavior , 2005, Third Latin American Web Congress (LA-WEB'2005).

[37]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.