Mining Association Rules in Hypertext Databases

In this work we propose a generalisation of the notion of association rule in the context of flat transactions to that of a composite association rule in the context of a structured directed graph, such as the world-wide-web. The techniques proposed aim at finding patterns in the user behaviour when traversing such a hypertext system. We redefine the concepts of confidence and support for composite association rules, and two algorithms to mine such rules are proposed. Extensive experiments with random data were conducted and the results show that, in spite of the worst-case complexity analysis which indicates exponential behaviour, in practice the algorithms' complexity, measured in the number of iterations performed, is linear in the number of nodes traversed.

[1]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[2]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[3]  Yasuhiko Morimoto,et al.  Mining optimized association rules for numeric attributes , 1996, J. Comput. Syst. Sci..

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[6]  David H. Jonassen,et al.  HyperText/Hypermedia , 1989 .

[7]  Jakob Nielsen,et al.  Hypertext and hypermedia , 1990 .

[8]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[9]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[10]  Oren Etzioni,et al.  Adaptive Web Sites: an AI Challenge , 1997, IJCAI.

[11]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[12]  Cyrus Shahabi,et al.  Analysis and design of server informative WWW-sites , 1997, CIKM '97.

[13]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[14]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[15]  Heikki Mannila,et al.  Methods and Problems in Data Mining , 1997, ICDT.

[16]  Frank Harary,et al.  Distance in graphs , 1990 .

[17]  Umeshwar Dayal,et al.  From User Access Patterns to Dynamic Hypertext Linking , 1996, Comput. Networks.

[18]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[19]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.