Web access patterns can provide valuable information for website designers in making website-based communication more efficient. To extract interesting or useful web access patterns, we use data mining techniques which analyze historical web access logs. In this paper, we present an efficient approach to mine the most interesting web access associations, where the word "interesting" denotes patterns that are supported by a high fraction of access activities with strong confidence. Our approach consists of three steps: 1) transform raw web logs to a relational table; 2) convert the relational table to a collection of access transactions; 3) mine the transaction collection to extract associations and rules. In both step 1 and step 2, we provide users with an effective mechanism to help them generate only "interesting" access records and transactions for mining. In the third step, we present a new efficient data mining algorithm to find the most interesting web access associations. We evaluate this approach using both synthetic data sets and real web logs and show the efficacy, efficiency and good scalability of the proposed mining methods.
[1]
Kyuseok Shim,et al.
Data mining and the Web: past, present and future
,
1999,
WIDM '99.
[2]
Rakesh Agrawal,et al.
SPRINT: A Scalable Parallel Classifier for Data Mining
,
1996,
VLDB.
[3]
Anupam Joshi,et al.
Warehousing and mining Web logs
,
1999,
WIDM '99.
[4]
Paul Pritchard,et al.
Finding the N largest itemsets
,
1970
.
[5]
Marek Kretowski,et al.
Discovery of Decision Rules from Databases: An Evolutionary Approach
,
1998,
PKDD.
[6]
Jaideep Srivastava,et al.
Web Mining: Pattern Discovery from World Wide Web Transactions
,
1996
.
[7]
Heikki Mannila,et al.
Fast Discovery of Association Rules
,
1996,
Advances in Knowledge Discovery and Data Mining.