论文信息 - Mining WWW Access Sequence by Matrix Clustering

Mining WWW Access Sequence by Matrix Clustering

Sequence pattern mining is one of the most important methods for mining WWW access log. The Apriori algorithm is well known as a typical algorithm for sequence pattern mining. However, it suffers from inherent difficulties in finding long sequential patterns and in extracting interesting patterns among a huge amount of results. This article proposes a new method for finding generalized sequence pattern by matrix clustering. This method decomposes a sequence into a set of sequence elements, each of which corresponds to an ordered pair of items. Then matrix clustering is applied to extract a cluster of similar sequences. The resulting sequence elements are composed into a generalized sequence. Our method is evaluated with practical WWW access log, which shows that it is practically useful in finding long sequences and in presenting the generalized sequence in a graph.

[1] Roberto J. Bayardo,et al. Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[2] Jaideep Srivastava,et al. Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[3] John Riedl,et al. E-Commerce Recommendation Applications , 2004, Data Mining and Knowledge Discovery.

[4] John Riedl,et al. An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[5] Andrew McCallum,et al. Distributional clustering of words for text classification , 1998, SIGIR '98.

[6] Umeshwar Dayal,et al. FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[7] Jaideep Srivastava,et al. Automatic personalization based on Web usage mining , 2000, CACM.

[8] John Riedl,et al. Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[9] Kazuto Kubota,et al. Application of Matrix Clustering to Web Log Analysis and Access Prediction , 2001 .

[10] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[11] Ramakrishnan Srikant,et al. Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[12] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[13] Heikki Mannila,et al. Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[14] Umeshwar Dayal,et al. PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[15] Laks V. S. Lakshmanan,et al. Scalable frequent-pattern mining methods: an overview , 2001, KDD Tutorials.

[16] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[17] Michael J. A. Berry,et al. Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[18] Inderjit S. Dhillon,et al. Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[19] Ron Kohavi,et al. Mining e-commerce data: the good, the bad, and the ugly , 2001, KDD '01.

[20] Qiming Chen,et al. PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[21] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.