Web User Session Reconstruction Using Integer Programming

An important input for Web usage mining is Web user sessions that must be reconstructed from Web logs (sessionization) when such sessions are not otherwise identified. We present a novel approach for sessionization based on an integer program. We compare results of our approach with the timeout heuristic on Web logs from an academic Web site. We find our integer program provides sessions that better match an expected empirical distribution with about half of the standard error of the heuristic.

[1]  Robert F. Dell,et al.  Formulating Integer Linear Programs: A Rogues' Gallery , 2007, INFORMS Trans. Educ..

[2]  V. Palade,et al.  Adaptive Web Sites - A Knowledge Extraction from Web Data Approach , 2008, Frontiers in Artificial Intelligence and Applications.

[3]  Myra Spiliopoulou,et al.  A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..

[4]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[5]  Albert-László Barabási,et al.  Modeling bursts and heavy tails in human dynamics , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.