Finding the most evident co-clusters on web log dataset using frequent super-sequence mining

It is important to mine the weblog dataset to find interesting and helpful information. There are three kinds of mining on weblog data which are web usage mining, web structure mining and web content mining. In our research, we are going to investigate web pages structure and find the most evident groups of users and web pages. Nowadays, big data is everywhere. Facing huge amount of web logs, it is not always necessary to group all the users in a web log dataset into different clusters, sometimes, finding out the major dominant user groups and the corresponding web pages is more important. In this paper, we are going to investigate a new way to search the most evident co-clusters of users and the corresponding web pages in the web log dataset using frequent super-sequence mining technique. Through experiments we find interesting results.

[1]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[2]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[3]  Xinran Yu,et al.  Super-sequence frequent pattern mining on sequential dataset , 2013, 2013 IEEE International Conference on Big Data.

[4]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[5]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[6]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[7]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[8]  Yanchun Zhang,et al.  Co-clustering Analysis of Weblogs Using Bipartite Spectral Projection Approach , 2010, KES.

[9]  Arindam Banerjee,et al.  Multi-way Clustering on Relation Graphs , 2007, SDM.

[10]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[12]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[13]  Chia-Hui Chang,et al.  Co-clustering with augmented matrix , 2013, Applied Intelligence.

[14]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[15]  Claude Berge,et al.  Graphs and Hypergraphs , 2021, Clustering.

[16]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.