Web Customer Modeling for Automated Session Prioritization on High Traffic Sites

In the Web environment, user identification is becoming a major challenge for admission control systems on high traffic sites. When a web server is overloaded there is a significant loss of throughput when we compare finished sessions and the number of responses per second; longer sessions are usually the ones ending in sales but also the most sensitive to load failures. Session-based admission control systems maintain a high QoS for a limited number of sessions, but does not maximize revenue as it treats all non-logged sessions the same. We present a novel method for learning to assign priorities to sessions according to the revenue that will generate. For this, we use traditional machine learning techniques and Markov-chain models. We are able to train a system to estimate the probability of the user's purchasing intentions according to its early navigation clicks and other static information. The predictions can be used by admission control systems to prioritize sessions or deny them if no resources are available, thus improving sales throughput per unit of time for a given infrastructure. We test our approach on access logs obtained from a high-traffic online travel agency, with promising results.

[1]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Jordi Torres,et al.  Characterizing secure dynamic Web applications scalability , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[4]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[5]  Virgílio A. F. Almeida,et al.  A methodology for workload characterization of E-commerce sites , 1999, EC '99.

[6]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[7]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[8]  Jordi Torres,et al.  Session-based adaptive overload control for secure dynamic Web applications , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[9]  Ludmila Cherkasova,et al.  Session-Based Admission Control: A Mechanism for Peak Load Management of Commercial Web Sites , 2002, IEEE Trans. Computers.

[10]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[11]  Yannis Manolopoulos,et al.  . EFFECTIVE PREDICTION OF WEB-USER ACCESSES: A DATA MINING APPROACH , 2001 .

[12]  Giovanni Squillero,et al.  A real-time evolutionary algorithm for Web prediction , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[13]  Beng Chin Ooi,et al.  Rule-assisted prefetching in Web-server caching , 2000, CIKM '00.

[14]  Jeffrey O. Kephart,et al.  Research challenges of autonomic computing , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[15]  Eryk Dutkiewicz,et al.  Session based differentiated quality of service admission control for Web servers , 2003, 2003 International Conference on Computer Networks and Mobile Computing, 2003. ICCNMC 2003..

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .