Characterizing Search Behavior in Web Archives

Web archives are a huge source of information to mine the past. However, tools to explore web archives are still in their infancy, in part due to the reduced knowledge that we have of their users. We contribute to this knowledge by presenting the first search behavior characterization of web archive users. We obtained detailed statistics about the users’ sessions, queries, terms and clicks from the analysis of their search logs. The results show that users did not spend much time and effort searching the past. They prefer short sessions, composed of short queries and few clicks. Full-text search is preferred to URL search, but both are frequently used. There is a strong evidence that users prefer the oldest documents over the newest, but mostly search without any temporal restriction. We discuss all these findings and their implications on the design of future web archives.

[1]  Mário J. Silva,et al.  Understanding the Information Needs of Web Archive Users , 2010 .

[2]  Miguel Costa,et al.  A Search Log Analysis of a Portuguese Web Search Engine , 2010 .

[3]  Christoph Hölscher,et al.  Web search behavior of Internet experts and newbies , 2000, Comput. Networks.

[4]  Marti A. Hearst Search User Interfaces , 2009 .

[5]  Karen Markey,et al.  Twenty-five years of end-user searching, Part 1: Research findings , 2007, J. Assoc. Inf. Sci. Technol..

[6]  Amanda Spink,et al.  U.S. versus European web searching trends , 2002, SIGF.

[7]  Miguel Costa,et al.  Introducing the Portuguese web archive initiative , 2008 .

[8]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[9]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.

[10]  Fabrizio Silvestri,et al.  Mining Query Logs: Turning Search Usage Data into Knowledge , 2010, Found. Trends Inf. Retr..

[11]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[12]  Amanda Spink,et al.  Multitasking Web searching and implications for design , 2003, ASIST.

[13]  Cristina Ribeiro,et al.  Use of Temporal Expressions in Web Search , 2008, ECIR.

[14]  SpinkAmanda,et al.  An analysis of web searching by European AlltheWeb.com users , 2005 .

[15]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[16]  Amanda Spink,et al.  An analysis of Web searching by European AlltheWeb.com users , 2005, Inf. Process. Manag..

[17]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[18]  Aristides Gionis,et al.  Design trade-offs for search engine caching , 2008, TWEB.

[19]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[20]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[21]  Ingmar Weber,et al.  The demographics of web search , 2010, SIGIR.

[22]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[23]  Fabrizio Silvestri,et al.  Mining query logs to optimize index partitioning in parallel web search engines , 2007, Infoscale.

[24]  Ricardo A. Baeza-Yates,et al.  Modeling user search behavior , 2005, Third Latin American Web Congress (LA-WEB'2005).

[25]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.