Mining the Most Interesting Web Access Associations

Web access patterns can provide valuable information for website designers in making website-based communication more efficient. To extract interesting or useful web access patterns, we use data mining techniques which analyze historical web access logs. In this paper, we present an efficient approach to mine the most interesting web access associations, where the word "interesting" denotes patterns that are supported by a high fraction of access activities with strong confidence. Our approach consists of three steps: 1) transform raw web logs to a relational table; 2) convert the relational table to a collection of access transactions; 3) mine the transaction collection to extract associations and rules. In both step 1 and step 2, we provide users with an effective mechanism to help them generate only "interesting" access records and transactions for mining. In the third step, we present a new efficient data mining algorithm to find the most interesting web access associations. We evaluate this approach using both synthetic data sets and real web logs and show the efficacy, efficiency and good scalability of the proposed mining methods.