Web user session characterization via clustering techniques

We focus on the identification and definition of "Web user-sessions", an aggregation of several TCP connections generated by the same source host on the basis of TCP connection opening time. The identification of a user session is non trivial; traditional approaches rely on threshold based mechanisms, which are very sensitive to the value assumed for the threshold and may be difficult to correctly set. By applying clustering techniques, we define a novel methodology to identify Web user-sessions without requiring an a priori definition of threshold values. We analyze the characteristics of user sessions extracted from real traces, studying the statistical properties of the identified sessions. From the study it emerges that Web user-sessions tend to be Poisson, but correlation may arise during periods of network/hosts anomalous functioning.

[1]  Alan Weiss,et al.  A Compound Model for TCP Connection Arrivals , 2000 .

[2]  Thomas D. Sandry,et al.  Introductory Statistics With R , 2003, Technometrics.

[3]  Anja Feldmann,et al.  Characteristics of TCP Connection Arrivals , 2002 .

[4]  Alan Weiss,et al.  A compound model for TCP connection arrivals for LAN and WAN applications , 2002, Comput. Networks.

[5]  Marco Mellia,et al.  Measuring IP and TCP behavior on edge nodes , 2002, Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE.

[6]  Sally Floyd,et al.  Wide-area traffic: the failure of Poisson modeling , 1994 .

[7]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[8]  Andrea Bianco,et al.  Exploiting Clustering Techniques for Web User-session Inference , 2005 .

[9]  Marco Mellia,et al.  Measuring IP and TCP behavior on edge nodes with Tstat , 2005, Comput. Networks.