Augmented intuitive dissimilarity metric for clustering of Web user sessions

Clustering is a very useful technique to categorise Web users with common browsing activities, access patterns and navigational behaviour. Web user clustering is used to build Web visitor profiles that make the core of a personalised information recommender system. These systems are used to comprehend Web users surfing activities by offering tailored content to Web users with similar interests. The principle objective of Web user sessions clustering is to maximise the intra-group while minimising the inter-group similarity. Efficient clustering of Web users’ sessions not only depend on the clustering algorithm’s nature but also depend on how well user concerns are captured and accommodated by the dissimilarity measure that are used. Determining the right dissimilarity measure to capture the access behaviour of the Web user is very significant for substantial clustering. In this paper, an intuitive dissimilarity measure is presented to estimate a Web user’s concern from augmented Web user sessions. The proposed usage dissimilarity measure between two Web user sessions is based on the accessing page relevance, the syntactic structure of page URL and hierarchical structure of the website. This proposed intuitive dissimilarity measure was used with K-Medoids Clustering algorithm for experimentation and results were compared with other independent dissimilarity measures. The worth of the generated clusters were evaluated by two unsupervised cluster validity indexes. The experimental results show that intuitive augmented session dissimilarity measure is more realistic and superior as compared to the other independent dissimilarity measures regarding cluster validity indexes.

[1]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[3]  Mohammad Reza Meybodi,et al.  An efficient algorithm for web recommendation systems , 2009, 2009 IEEE/ACS International Conference on Computer Systems and Applications.

[4]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[5]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[6]  Steven Skiena,et al.  Lowest common ancestors in trees and directed acyclic graphs , 2005, J. Algorithms.

[7]  Om Prakash Vyas,et al.  Agglomerative Approach for Identification and Elimination of Web Robots from Web Server Logs to Extract Knowledge about Actual Visitors , 2015 .

[8]  Dale Schuurmans,et al.  Dynamic Web log session identification with statistical language models , 2004, J. Assoc. Inf. Sci. Technol..

[9]  Antonio Badia,et al.  A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  Karin Becker,et al.  Clustering Web Sessions by Levels of Page Similarity , 2006, PAKDD.

[11]  Jian-Ping Mei,et al.  Fuzzy clustering with weighted medoids for relational data , 2010, Pattern Recognit..

[12]  Keun Ho Ryu,et al.  MapReduce-based web mining for prediction of web-user navigation , 2014, J. Inf. Sci..

[13]  Abhinav Srivastava,et al.  Speeding Up Web Access Using Weighted Association Rules , 2005, PReMI.

[14]  R. Krishnapuram,et al.  A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[15]  Jaideep Srivastava,et al.  Incorporating Concept Hierarchies into Usage Mining Based Recommendations , 2006, WEBKDD.

[16]  Haibin Liu,et al.  Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users' future requests , 2007, Data Knowl. Eng..

[17]  Osmar R. Zaïane,et al.  Clustering Web sessions by sequence alignment , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[18]  Pablo E. Román,et al.  Identifying web sessions with simulated annealing , 2014, Expert Syst. Appl..

[19]  Umeshwar Dayal,et al.  From User Access Patterns to Dynamic Hypertext Linking , 1996, Comput. Networks.

[20]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[21]  Chao-Min Chiu,et al.  Towards a Hypermedia-enabled and Web-based Data Analysis Framework , 2004, J. Inf. Sci..