Alignment Based Similarity distance Measure for Better Web Sessions Clustering

The evolution of the internet along with the popularity of the web has attracted a great attention among the researchers to web usage mining. Given that, there is an exponential growth in terms of amount of data available in the web that may not give the required information immediately; web usage mining extracts the useful information from the huge amount of data available in the web logs that contain information regarding web pages accessed. Due to this huge amount of data, it is better to handle small group of data at a time, instead of dealing with entire data together. In order to cluster the data, similarity measure is essential to obtain the distance between any two user sessions. The objective of this paper is to propose a technique, to measure the similarity between any two user sessions based on sequence alignment technique that uses the dynamic programming method.

[1]  D. Krol,et al.  Investigation of internet system user behaviour using cluster analysis , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[2]  HongLiu,et al.  Web user clustering analysis based on KMeans algorithm , 2010, ICOIN 2010.

[3]  Chaofeng Li,et al.  Similarity measurement of Web sessions based on sequence alignment , 2007, Wuhan University Journal of Natural Sciences.

[4]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[5]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[6]  Mahmudur Rahman,et al.  Pattern Discovery of Web Usage Mining , 2009, 2009 International Conference on Computer Technology and Development.

[7]  Fabio A. González,et al.  Page clustering using a distance based algorithm , 2005, Third Latin American Web Congress (LA-WEB'2005).

[8]  Yongjian Fu,et al.  A Generalization-Based Approach to Clustering of Web Usage Sessions , 1999, WEBKDD.

[9]  Chien-Chung Chan,et al.  Multidimensional Sessions Comparison Method Using Dynamic Programming , 2007, 2007 Innovations in Information Technologies (IIT).

[10]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[11]  Prakash S. Raghavendra,et al.  Web User Session Clustering Using Modified K-Means Algorithm , 2011, ACC.

[12]  Geert Wets,et al.  Mining Navigation Patterns Using a Sequence Alignment Method , 2004, Knowl. Inf. Syst..

[13]  Pinar Senkul,et al.  Using Ontology and Sequence Information for Extracting Behavior Patterns from Web Navigation Logs , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[14]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15]  Kaharudin Dimyati,et al.  Journal of Bioinformatics , 2007 .