Application of Matrix Clustering to Web Log Analysis and Access Prediction

Matrix clustering is a new data mining method which extracts a dense sub-matrix from a large sparse binary matrix. We propose an e cient algorithm named the ping-pong algorithm which enables real-time mining of a large sparse matrix. This article describes the application of matrix clustering to Web usage mining. Matrix clustering can be applied to Web access log analysis by representing relationships between pages and users in a binary matrix. An experiment with a practical WWW access log shows that page clusters can be extracted by applying matrix clustering. The extracted page clusters are compared with those obtained by association rule mining. The result shows that matrix clustering is more powerful in nding various types of page clusters. The page clusters extracted by matrix clustering can be applied to web access prediction. We have also compared the matrix clustering with association rule mining and sequence pattern mining with respect to access prediction. The result shows that matrix clustering has a higher hit rate than these methods when the session length is long.