Validation and interpretation of Web users' sessions clusters

Understanding users' navigation on the Web is important towards improving the quality of information and the speed of accessing large-scale Web data sources. Clustering of users' navigation into sessions has been proposed in order to identify patterns and similarities which are then managed in the context of Web users oriented applications (searching, e-commerce, etc.). This paper deals with the problem of assessing the quality of user session clusters in order to make inferences regarding the users' navigation behavior. A common model-based clustering algorithm is used to result in clusters of Web users' sessions. These clusters are validated by using a statistical test, which measures the distances of the clusters' distributions to infer their dissimilarity and distinguishing level. Furthermore, a visualization method is proposed in order to interpret the relation between clusters. Using real data sets, we illustrate how the proposed analysis can be applied in popular application scenarios to reveal valuable associations among Web users' navigation sessions.

[1]  Osmar R. Zaïane,et al.  Clustering Web sessions by sequence alignment , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[2]  Michael K. Ng,et al.  An empirical study on the visual cluster validation method with Fastmap , 2001, Proceedings Seventh International Conference on Database Systems for Advanced Applications. DASFAA 2001.

[3]  James E. Pitkow,et al.  Characterizing Browsing Behaviors on the World-Wide Web , 1995 .

[4]  MAGDALINI EIRINAKI,et al.  Web mining for web personalization , 2003, TOIT.

[5]  Tom Heskes,et al.  Categorization of web pages and user clustering with mixtures of hidden Markov models , 2008, KDD 2008.

[6]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[7]  Horst Bunke,et al.  Validation indices for graph clustering , 2003, Pattern Recognit. Lett..

[8]  Kate Smith-Miles,et al.  Web page clustering using a self-organizing map of user navigation patterns , 2003, Decis. Support Syst..

[9]  George Pallis,et al.  Insight and perspectives for content delivery networks , 2006, CACM.

[10]  Zhixiang Chen,et al.  Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs , 2004, World Wide Web.

[11]  Alan Agresti,et al.  The Measurement of Classification Agreement: An Adjustment to the Rand Statistic for Chance Agreement , 1984 .

[12]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[13]  Lefteris Angelis,et al.  A probabilistic validation algorithm for Web users' clusters , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[14]  Daqing He,et al.  Analysing Web Search Logs to Determine Session Boundaries for User-Oriented Learning , 2000, AH.

[15]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[16]  Michael K. Ng,et al.  A Cube Model and Cluster Analysis for Web Access Sessions , 2001, WEBKDD.

[17]  Lefteris Angelis,et al.  Model-Based Cluster Analysis for Web Users Sessions , 2005, ISMIS.

[18]  K. Vanhoof,et al.  Clustering navigation patterns on a website using a Sequence Alignment Method , 2001 .

[19]  Farnoush Banaei Kashani,et al.  INSITE: A Tool for Interpreting Users? Interaction with a Web Space , 2000, VLDB.

[20]  Rajesh Parekh,et al.  Lessons and Challenges from Mining Retail E-Commerce Data , 2004, Machine Learning.

[21]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[22]  PatternsYongjian,et al.  Clustering of Web Users Based on Access , 1999 .

[23]  Arbee L. P. Chen,et al.  Proceedings of the Sixth International Conference on Database Systems for Advanced Applications , 1999 .

[24]  Dale Schuurmans,et al.  Dynamic Web log session identification with statistical language models , 2004, J. Assoc. Inf. Sci. Technol..

[25]  Kannan Srinivasan,et al.  Modeling Online Browsing and Path Analysis Using Clickstream Data , 2004 .

[26]  Ishwar K. Sethi,et al.  The performance analysis of a Chi-square similarity measure for topic related clustering of noisy transcripts , 2002, Object recognition supported by user interaction for service robots.

[27]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[28]  G. W. Snedecor Statistical Methods , 1964 .

[29]  Y. Javed,et al.  Emitter recognition based on modified X-means clustering , 2005, Proceedings of the IEEE Symposium on Emerging Technologies, 2005..

[30]  AngelisLefteris,et al.  Validation and interpretation of Web users' sessions clusters , 2007 .

[31]  Pierre Baldi,et al.  Modeling the Internet and the Web: Probabilistic Method and Algorithms , 2002 .

[32]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[33]  Hsinchun Chen,et al.  Information navigation on the web by clustering and summarizing query results , 2001, Inf. Process. Manag..

[34]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[35]  Keke Chen,et al.  Validating and refining clusters via visual rendering , 2003, Third IEEE International Conference on Data Mining.

[36]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[37]  Pedro M. Domingos,et al.  Relational Markov models and their application to adaptive web navigation , 2002, KDD.

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  Myra Spiliopoulou,et al.  Analysis of navigation behaviour in web sites integrating multiple information systems , 2000, The VLDB Journal.

[40]  Wk Ching,et al.  A Cube Model for Web Access Sessions and Cluster Analysis , 2001 .

[41]  Padhraic Smyth,et al.  Model-Based Clustering and Visualization of Navigation Patterns on a Web Site , 2003, Data Mining and Knowledge Discovery.

[42]  Gérard Govaert,et al.  Clustering of contingency table and mixture model , 2007, Eur. J. Oper. Res..

[43]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[44]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.