What can be Found on the Web and How: A Characterization of Web Browsing Patterns

In this paper, we suggest a novel approach to studying user browsing behavior, i.e., the ways users get to different pages on the Web. Namely, we classified all user browsing paths leading to web pages into several types or browsing patterns. In order to define browsing patterns, we consider several important points of the browsing path: its origin, the last page before the user gets to the domain of the target page, and the target page referrer. Each point can be of several types, which leads to 56 possible patterns. The distribution of the browsing paths over these patterns forms the navigational profile of a web page. We conducted a comprehensive large-scale study of navigational profiles of different web pages. First, we demonstrated that the navigational profile of a web page carry crucial information about the properties of this page (e.g., its popularity and age). Second, we found that the Web consists of several typical non-overlapping clusters formed by pages of similar ranges of incoming traffic. These clusters can be characterized by the functionality of their pages.

[1]  Amanda Spink,et al.  Multitasking during Web search sessions , 2006, Inf. Process. Manag..

[2]  Ingmar Weber,et al.  Who uses web search for what: and how , 2011, WSDM '11.

[3]  Gleb Gusev,et al.  Through-the-looking glass: utilizing rich post-search trail statistics for web search , 2013, CIKM.

[4]  Ryen W. White,et al.  Mining Historic Query Trails to Label Long and Rare Search Engine Queries , 2010, TWEB.

[5]  N. L. Johnson,et al.  Linear Statistical Inference and Its Applications , 1966 .

[6]  Zhenyu Liu,et al.  Analysis of User Web Traffic with A Focus on Search Activities , 2005, WebDB.

[7]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[8]  Junghoo Cho,et al.  Impact of search engines on page popularity , 2004, WWW '04.

[9]  Nina Mishra,et al.  Domain bias in web search , 2012, WSDM '12.

[10]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[11]  Ryen W. White,et al.  Mining the search trails of surfing crowds: identifying relevant websites from user activity , 2008, WWW.

[12]  Ricardo Baeza-Yates,et al.  The Evolution of Web Content and Search Engines , 2006 .

[13]  Tie-Yan Liu,et al.  BrowseRank: letting web users vote for page importance , 2008, SIGIR '08.

[14]  Gleb Gusev,et al.  Introducing search behavior into browsing based models of page's importance , 2013, WWW '13 Companion.

[15]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[16]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[17]  Ryen W. White,et al.  Assessing the scenic route: measuring the value of search trails in web logs , 2010, SIGIR.

[18]  Gleb Gusev,et al.  Crawling Policies Based on Web Page Popularity Prediction , 2014, ECIR.

[19]  J. Friedman Stochastic gradient boosting , 2002 .

[20]  Minghai Liu,et al.  User browsing behavior-driven web crawling , 2011, CIKM '11.

[21]  Sharad Goel,et al.  Who Does What on the Web: A Large-Scale Study of Browsing Behavior , 2012, ICWSM.

[22]  Ravi Kumar,et al.  A characterization of online browsing behavior , 2010, WWW '10.