Discovery of Interesting Usage Patterns from Web Data

Web Usage Mining is the application of data mining techniques to large Web data repositories in order to extract usage patterns. As with many data mining application domains, the identification of patterns that are considered interesting is a problem that must be solved in addition to simply generating them. Aneces sary step in identifying interesting results is quantifying what is considered uninteresting in order to form a basis for comparison. Several research efforts have relied on manually generated sets of uninteresting rules. However, manual generation of a comprehensive set of evidence about beliefs for a particular domain is impractical in many cases. Generally, domain knowledge can be used to automatically create evidence for or against a set of beliefs. This paper develops a quantitative model based on support logic for determining the interestingness of discovered patterns. For Web Usage Mining, there are three types of domain information available; usage, content, and structure. This paper also describes algorithms for using these three types of information to automatically identify interesting knowledge. These algorithms have been incorporated into the Web Site Information Filter (WebSIFT) system and examples of interesting frequent itemsets automatically discovered from real Web data are presented.

[1]  Gregory Piatetsky-Shapiro,et al.  The interestingness of deviations , 1994 .

[2]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[3]  Maurice D. Mulvenna,et al.  Discovering Internet marketing intelligence through online analytical web usage mining , 1998, SGMD.

[4]  Thorsten Joachims,et al.  Web Watcher: A Tour Guide for the World Wide Web , 1997, IJCAI.

[5]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[6]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[7]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[8]  Anupam Joshi,et al.  Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator , 1999, WWW 1999.

[9]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[10]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[11]  James E. Pitkow,et al.  Summary of WWW characterizations , 1998, World Wide Web.

[12]  James F. Baldwin,et al.  Evidential support logic programming , 1987 .

[13]  Lotfi A. Zadeh,et al.  A Theory of Approximate Reasoning , 1979 .

[14]  Xindong Wu,et al.  SiteHelper: A Localized Agent That Helps Incremental Exploration of the World Wide Web , 1997, Comput. Networks.

[15]  T. Joachims WebWatcher : A Tour Guide for the World Wide Web , 1997 .

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  James F. Baldwin,et al.  Concept Learning from Examples and Counter Examples , 1989, Int. J. Man Mach. Stud..

[18]  Wynne Hsu,et al.  Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.

[19]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[20]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[21]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[22]  Balaji Padmanabhan,et al.  A Belief-Driven Method for Discovering Unexpected Patterns , 1998, KDD.

[23]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[24]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.