Scalability of OAT

Summary form only given. Mining user access patterns from clickstream data has attracted much attention from the research community. However, the scalability testing of corresponding mining algorithms has been virtually ignored. Memory requirements of these algorithms may be quite large due to the fact that in-memory data structures whose size depends on the number and length of patterns is often assumed. Due to the importance of the scalability of algorithms to the usefulness of the Web usage mining (WUM) techniques, we propose two new sampling techniques, continuous and random, which can be applied to static sized test datasets to examine WUM algorithm scalability. We illustrate the usefulness of these scalability approaches by performing scalability tests using the online adaptive traversal (OAT) pattern mining algorithm. These experiments show that indeed the OAT algorithm adjusts to the amount of memory and time requirements grow at a linear rate. This paper has several results: 1. The OAT algorithm is shown to be scalable in both space and time. The time grows at a linear rate, while the space adapts to available memory through compression. 2. Two sampling techniques are presented which facilitate the performance of scalability experiments against fixed size Web logs. 3. The impact of spiders crawling on the Web can have a disastrous impact on programs running to collect WUM statistics and patterns.

[1]  Zhigang Li,et al.  Efficient data mining for maximal frequent subtrees , 2003, Third IEEE International Conference on Data Mining.

[2]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[3]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[4]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[5]  Margaret H. Dunham,et al.  Efficient mining of traversal patterns , 2001, Data Knowl. Eng..

[6]  Ramakrishnan Srikant,et al.  Mining web logs to improve website organization , 2001, WWW '01.

[7]  Anupam Joshi,et al.  On Mining Web Access Logs , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[8]  Myra Spiliopoulou,et al.  Improving the Effectiveness of a Web Site with Web Usage Mining , 1999, WEBKDD.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[11]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.