Comparison of Interestingness Measures for Web Usage Mining: an Empirical Study

A common problem in mining association rules or sequential patterns is that a large number of rules or patterns can be generated from a database, making it impossible for a human analyst to digest the results. Solutions to the problem include, among others, using interestingness measures to identify interesting rules or patterns and pruning rules that are considered redundant. Various interestingness measures have been proposed, but little work has been reported on the effectiveness of the measures on real-world applications. We present an application of Web usage mining to a large collection of Livelink log data. Livelink is a web-based product of Open Text Corporation, which provides automatic management and retrieval of different types of information objects over an intranet, an extranet or the Internet. We report our experience in preprocessing raw log data, mining association rules and sequential patterns from the log data, and identifying interesting rules and patterns by use of interestingness measures and some pruning methods. In particular, we evaluate a number of interestingness measures in terms of their effectiveness in finding interesting association rules and sequential patterns. Our results show that some measures are much more effective than others.

[1]  Ke Wang,et al.  Mining Customer Value: From Association Rules to Direct Marketing , 2005, Data Mining and Knowledge Discovery.

[2]  Maurice D. Mulvenna,et al.  Personalization on the Net using Web mining: introduction , 2000, CACM.

[3]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[4]  Dan A. Simovici,et al.  Generating an informative cover for association rules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[6]  Xin Jin,et al.  A maximum entropy web recommendation system: combining collaborative and content features , 2005, KDD '05.

[7]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[8]  Dale Schuurmans,et al.  Dynamic Web log session identification with statistical language models , 2004, J. Assoc. Inf. Sci. Technol..

[9]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[10]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[11]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[12]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[13]  Mark Levene,et al.  Computing the Entropy of User Navigation in the Web , 2003, Int. J. Inf. Technol. Decis. Mak..

[14]  Xiangji Huang,et al.  Discovery of interesting association rules from Livelink web log data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[15]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[16]  Jae-Yearn Kim,et al.  A Sequence-Element-Based Hierarchical Clustering Algorithm For Categorical Sequence Data , 2005, Int. J. Inf. Technol. Decis. Mak..

[17]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[18]  Pang-Ning Tan,et al.  Interestingness Measures for Association Patterns : A Perspective , 2000, KDD 2000.

[19]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[20]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[22]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[23]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[24]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[25]  Srinivasan Parthasarathy,et al.  Evaluation of sampling for data mining of association rules , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[26]  Nick Cercone,et al.  Rule Quality Measures for Rule Induction Systems: Description and Evaluation , 2001, Comput. Intell..

[27]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[28]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[29]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[30]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[31]  Mark Levene,et al.  Generating Dynamic Higher-Order Markov Models in Web Usage Mining , 2005, PKDD.