An Efficient Algorithm for Extracting High Utility Itemsets from Weblog Data

ABSTRACT High utility itemset refers to those set of items which has high utility such as profit in a database. High utility of itemset plays a crucial role in real life. In recent years, various algorithms have been proposed for finding high utility itemset but unfortunately they are not completely relevant at the time and space point of view. In the data mining field, high utility itemset can be found in different categories of data like time series, categorical, etc. Log data is useful for finding behaviour of the user in different aspects. In this paper, we have proposed an algorithm named HUIM (High Utility Itemsets Mining) and construct HUI-FP (High Utility Itemsets-Frequent Pattern) Tree for efficiently mining high utility itemsets from log database. The behaviour of the user can be predicted through the high utility of every visited page. We have also proposed pattern generation technique based on cosine similarities among itemsets. These techniques generate strong patterns, and customized users profile according to that pattern. The proposed algorithm is better than the previous state of the art algorithm for high utility itemset generation.

[1]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[2]  Yue-Shi Lee,et al.  The Studies of Mining Frequent Patterns Based on Frequent Pattern Tree , 2009, PAKDD.

[3]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[4]  Krishna Kumar Mohbey,et al.  Interesting User Behaviour Prediction in Mobile E-commerce Environment using Constraints , 2015 .

[5]  P. S. Grover,et al.  Constraint-based sequential pattern mining: a pattern growth algorithm incorporating compactness, length and monetary , 2014, Int. Arab J. Inf. Technol..

[6]  Pasi Fränti,et al.  Web Data Mining , 2009, Encyclopedia of Database Systems.

[7]  Brijesh Bakariya,et al.  "An Inclusive Survey on Data Preprocessing Methods Used in Web Usage Mining" , 2012, BIC-TA.

[8]  Junjie Wu,et al.  Scaling up cosine interesting pattern discovery: A depth-first method , 2014, Inf. Sci..

[9]  Unil Yun,et al.  Efficient mining of weighted interesting patterns with a strong weight and/or support affinity , 2007, Inf. Sci..

[10]  Hui Xiong,et al.  Hyperclique pattern discovery , 2006, Data Mining and Knowledge Discovery.

[11]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12]  Tamilarasi Angamuthu,et al.  An efficient approach for effectual mining of relational patterns from multi-relational database , 2013, Int. Arab J. Inf. Technol..

[13]  Wolfgang Nejdl,et al.  Recommending High Utility Query via Session-Flow Graph , 2013, ECIR.

[14]  Yue-Shi Lee,et al.  Mining High Utility Quantitative Association Rules , 2007, DaWaK.

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  Donghai Guan,et al.  A Survey of mislabeled training data detection techniques for pattern classification , 2013 .

[17]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[18]  Ada Wai-Chee Fu,et al.  Mining association rules with weighted items , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[19]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[21]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[22]  Ke Sun,et al.  Mining Weighted Association Rules without Preassigned Weights , 2008, IEEE Transactions on Knowledge and Data Engineering.

[23]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[24]  Byeong-Soo Jeong,et al.  Mining High Utility Web Access Sequences in Dynamic Web Log Data , 2010, 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[25]  Hui Xiong,et al.  Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs , 2004, KDD.

[26]  Om Prakash Vyas,et al.  Overview of Itemset Utility Mining and its Applications , 2010 .

[27]  Nickolas Savarimuthu,et al.  A conditional tree based novel algorithm for high utility itemset mining , 2011, 2011 International Conference on Recent Trends in Information Technology (ICRTIT).

[28]  John J. Leggett,et al.  WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity , 2006, SDM.

[29]  Raj P. Gopalan,et al.  CTU-Mine: An Efficient High Utility Itemset Mining Algorithm Using the Pattern Growth Approach , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[30]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[31]  Byeong-Soo Jeong,et al.  An Efficient Distributed Programming Model for Mining Useful Patterns in Big Datasets , 2013 .

[32]  S. Jayanthi,et al.  A Fast Algorithm for Mining High Utility Itemsets , 2009, 2009 IEEE International Advance Computing Conference.