A NEW TRAFFIC MODEL FOR CURRENT USER WEB BROWSING BEHAVIOR

Given the wide use of HTTP traffic models to model user web browsing behaviour, it is important that the model be representative of a large variety of traffic and be continually updated to reflect the constantly evolving nature of web content and the exponential growth in number of users. In this paper, we analyzed an extensive set of proxy web server logs to understand changes in network traffic patterns. We found significant gaps in the methods previously proposed, specifically the major one being that it is almost impossible to detect a web request generated from a user click from one generated from various embedded scripts and frames. As a result, we modified the definition of a web request boundary. Due to the presence of large numbers of embedded objects from several different off-site sources, which cannot be traced back to the original request through following TCP/IP headers source addresses alone, newer heuristics need to be devised. We present our methodology for analyzing the squid proxy log in a way that preserves user privacy, and propose a new HTTP traffic model and traffic generator to represent current user web browsing behaviour. Comparison of independent statistics from the trace and the model shows a fair match.

[1]  Bruce A. Mah,et al.  An empirical model of HTTP network traffic , 1997, Proceedings of INFOCOM '97.

[2]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[3]  Kevin Jeffay,et al.  What TCP/IP protocol headers can tell us about the web , 2001, SIGMETRICS '01.

[4]  Yu Wang,et al.  Different behavioral characteristics of Web traffic between wireless and wire IP network , 2003, International Conference on Communication Technology Proceedings, 2003. ICCT 2003..

[5]  Hyoung-Kee Choi,et al.  A behavioral model of Web traffic , 1999, Proceedings. Seventh International Conference on Network Protocols.