Mining Web Access Sequence with Improved Apriori Algorithm

Apriori algorithm is a classic mining algorithm which can mining association rules and sequential patterns. However, when the Apriori algorithm is applied to contiguous sequential pattern mining, it is inefficient. In web log mining, the contiguous sequential pattern can better represent the semantic information of the user's access to the site due to the continuity of the user's visit to the site page. Contiguous sequential pattern can be used not only to predict the user's next access request, but also to improve the site topology structure and set the advertising page. The Apriori algorithm is used to generate a large number of candidates when mining contiguous sequence patterns, and to scan the transaction database frequently. In this paper, we present an improved algorithm that we called AC-Apriori algorithm based on the Apriori algorithm. The AC-Apriori algorithm reduces the times scanning the transaction database while preserving the full mining effect, which reduces the runtime and improves the mining efficiency compared with the Apriori algorithm.

[1]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[2]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[3]  Bamshad Mobasher,et al.  Web Usage Mining and Personalization , 2004, The Practical Handbook of Internet Computing.

[4]  Bamshad Mobasher,et al.  A Hybrid Web Personalization Model Based on Site Connectivity , 2003 .

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[7]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[8]  Feng Yu,et al.  A dynamic improved apriori algorithm and its experiments in web log mining , 2012, 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery.

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  Ford lumban Gaol,et al.  Exploring the Pattern of Habits of Users Using Web log Squential Pattern , 2010, 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[11]  MAGDALINI EIRINAKI,et al.  Web mining for web personalization , 2003, TOIT.

[12]  Mathias Géry,et al.  Evaluation of web usage mining approaches for user's next request prediction , 2003, WIDM '03.

[13]  Michael D. Smith,et al.  Using Path Profiles to Predict HTTP Requests , 1998, Comput. Networks.

[14]  R. H. Goudar,et al.  User behavior analysis in web log through comparative study of Eclat and Apriori , 2013, 2013 7th International Conference on Intelligent Systems and Control (ISCO).