Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming

With the rapid increasing popularity of the WWW, Websites are playing a crucial role to convey knowledge to the end users. Every request of Web site or a transaction on the server is stored in a file called server log file.  Providing Web administrator with meaningful information about user access behavior (also called click stream data) has become a necessity to improve the quality of Web information and service performance. As such, the hidden knowledge obtained from mining, web server traffic data and user access patterns ( called Web Usage Mining), could be directly  used for marketing and management of E-business, E-services, E-searching , E-education and so on.Categorizing visitors or users based on their interaction with a web site is a key problem in web usage mining. The click stream generated by various users often follows distinct patterns, clustering  of  the access pattern will provide the  knowledge,  which may help in recommender system of  finding learning pattern of user  in E-learning system , finding group of visitors  with similar interest , providing  customized content in site manager, categorizing  customers in E-shopping etc.Given session information, this paper focuses a method to find session similarity by sequence alignment using dynamic programming, and proposes a model such as similarity matrix for representing session similarity measures. The work presented in this paper follows Agglomerative Hierarchical Clustering method to cluster the similarity matrix in order to group similar sessions and the clustering process is depicted in dendrogram diagram.

[1]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[2]  Osmar R. Zaïane,et al.  Clustering Web sessions by sequence alignment , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[3]  M. Tamer Özsu,et al.  A Web page prediction model based on click-stream tree representation of user behavior , 2003, KDD '03.

[4]  D. Szafron,et al.  Sequence Alignment using FastLSA , 2000 .

[5]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[6]  Aaron Davidson,et al.  A fast pruning algorithm for optimal sequence alignment , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[7]  PatternsYongjian,et al.  Clustering of Web Users Based on Access , 1999 .

[8]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[9]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .