User centric walk: an integrated approach for modeling the browsing behavior of users on the Web

The performance evaluation of Web applications usually requires the analysis of sequences of user requests for specific Web pages. These sequences can be obtained, for example, by applying empirical methods (recording the real sequence of requests), or by applying a formal model for generating synthetic results. In this paper, we present our Web browsing model and its implementation as part of our novel user centric walk algorithm. By taking into account the hyperlink structure as well as the different user behavior on the Web, user centric walk allows us to generate accurate synthetic data that can be used instead of empirically obtained requests. Additionally, in this paper we show using empirical data that the probability of choosing some hyperlink from a given page as well as the probability of a user leaving a page without following a hyperlink is best characterized by a power-law. Finally, we show the flexibility and applicability of our model by performing the required correlations to empirical data, in order to validate our approach.

[1]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[2]  Jerome A. Rolia,et al.  The internet vs e-commerce servers: when will server performance matter? , 1998, CASCON.

[3]  William J. Reed,et al.  The Double Pareto-Lognormal Distribution—A New Parametric Model for Size Distributions , 2004, WWW 2001.

[4]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[5]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[6]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[7]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2003, WWW '03.

[8]  Steven Glassman,et al.  A Caching Relay for the World Wide Web , 1994, Comput. Networks ISDN Syst..

[9]  Andrzej Pelc,et al.  Enhancing Hyperlink Structure for Improving Web Performance , 2002, J. Web Eng..

[10]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[11]  Mark Levene,et al.  Zipf's Law for Web Surfers , 2001, Knowledge and Information Systems.

[12]  Yiming Ye,et al.  Agent-Based Characterization of Web Regularities , 2003 .

[13]  Eytan Adar,et al.  The Economics of Surfing , 1999 .

[14]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[15]  Daniel A. Menascé,et al.  Testing E-commerce Site Scalability With TPC-W , 2001, Int. CMG Conference.

[16]  CachingLee,et al.  On the Implications of Zipf ' s Law for Web , 1998 .

[17]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[18]  Daniel A. Menascé,et al.  TPC-W: A Benchmark for E-Commerce , 2002, IEEE Internet Comput..

[19]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[20]  Krishna Bharat,et al.  Who links to whom: mining linkage between Web sites , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[21]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[22]  Andy Cockburn,et al.  Improving Web Page Revisitation: Analysis, Design, and Evaluation , 2002 .

[23]  Pedro José Marrón,et al.  An enhanced hoarding approach based on graph analysis , 2004, IEEE International Conference on Mobile Data Management, 2004. Proceedings. 2004.

[24]  Kurt Rothermel,et al.  Exploiting location information for infostation-based hoarding , 2001, MobiCom '01.

[25]  Marco Gori,et al.  Web page scoring systems for horizontal and vertical search , 2002, WWW.

[26]  Saul Greenberg,et al.  How people revisit web pages: empirical findings and implications for the design of history systems , 1997, Int. J. Hum. Comput. Stud..

[27]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.