Extracting Non-redundant Correlated Purchase Behaviors by Utility Measure

Abstract From web search and data mining, users’ click and purchase behaviors contain valuable information, thus numerous approaches have been proposed to identify embedded useful knowledge from them. In these real-life situations, each user may perform the same action/event multiple times, and multiple accessed events product different profit. Many utility-oriented data mining approaches thus have been extensively studied. Previous studies have the limitation that the overall utility of traditional pattern is limited since they rarely consider the inherent correlation. For example, from the purchase behavior, the low-utility patterns sometimes with a very high-utility pattern will be considered as a valuable pattern even if this behavior may be not highly correlated. A more intelligent framework that provides non-redundant and correlated behavior based on utility measure is thus desired. In this paper, we first present a novel method to extract non-redundant correlated purchase behaviors considering the utility and correlation factors. The high qualified patterns can be derived with high profit and strong correlation, which can lead to higher recall and reveal better precision. In the proposed projection-based approach, an efficient projection mechanism and a sorted downward closure property are developed to reduce the database size. Several pruning strategies are further developed to efficiently and effectively discover the desired patterns. An extensive experimental study showed that the novel non-redundant correlated high-utility pattern has more effectiveness than the previous knowledge representation. Moreover, the proposed algorithm is efficient in terms of execution time and memory usage.

[1]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[2]  Tzung-Pei Hong,et al.  Efficient mining of high-utility itemsets using multiple minimum utility thresholds , 2016, Knowl. Based Syst..

[3]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Tzung-Pei Hong,et al.  Discovery of high utility itemsets from on-shelf time periods of products , 2011, Expert Syst. Appl..

[5]  Bay Vo,et al.  A lattice-based approach for mining high utility association rules , 2017, Inf. Sci..

[6]  Tzung-Pei Hong,et al.  Efficient algorithms for mining up-to-date high-utility patterns , 2015, Adv. Eng. Informatics.

[7]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[8]  Srikumar Krishnamoorthy,et al.  Pruning strategies for mining high utility itemsets , 2015, Expert Syst. Appl..

[9]  Philippe Fournier-Viger,et al.  FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits , 2016, Knowl. Based Syst..

[10]  Justin Zhijun Zhan,et al.  Mining of frequent patterns with multiple minimum supports , 2017, Eng. Appl. Artif. Intell..

[11]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[13]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Philippe Fournier-Viger,et al.  A survey of itemset mining , 2017, WIREs Data Mining Knowl. Discov..

[15]  Philip S. Yu,et al.  Efficient Algorithms for Mining Top-K High Utility Itemsets , 2016, IEEE Transactions on Knowledge and Data Engineering.

[16]  Vincent S. Tseng,et al.  EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining , 2015, MICAI.

[17]  Heungmo Ryang,et al.  Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques , 2017, Knowledge and Information Systems.

[18]  Jiawei Han,et al.  CCMine: Efficient Mining of Confidence-Closed Correlated Patterns , 2004, PAKDD.

[19]  Vincent S. Tseng,et al.  Mining High Utility Itemsets in Big Data , 2015, PAKDD.

[20]  Tzung-Pei Hong,et al.  FDHUP: Fast algorithm for mining discriminative high utility patterns , 2017, Knowledge and Information Systems.

[21]  Justin Zhijun Zhan,et al.  Data mining in distributed environment: a survey , 2017, WIREs Data Mining Knowl. Discov..

[22]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[23]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[24]  Tzung-Pei Hong,et al.  Efficient algorithms for mining high-utility itemsets in uncertain databases , 2016, Knowl. Based Syst..

[25]  Philippe Fournier-Viger,et al.  Extracting Non-redundant Correlated Purchase Behaviors by Utility Measure , 2017, DaWaK.

[26]  Jiawei Han,et al.  Re-examination of interestingness measures in pattern mining: a unified framework , 2010, Data Mining and Knowledge Discovery.

[27]  Tzung-Pei Hong,et al.  A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification , 2015, Adv. Eng. Informatics.

[28]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[29]  Unil Yun,et al.  Efficient mining of high utility pattern with considering of rarity and length , 2015, Applied Intelligence.

[30]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[31]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[32]  Cory J. Butz,et al.  A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[33]  Antonio Gomariz,et al.  The SPMF Open-Source Data Mining Library Version 2 , 2016, ECML/PKDD.

[34]  Tzung-Pei Hong,et al.  Mining high-utility itemsets with various discount strategies , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[35]  Benjamin C. M. Fung,et al.  Direct Discovery of High Utility Itemsets without Candidate Generation , 2012, 2012 IEEE 12th International Conference on Data Mining.

[36]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[37]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[38]  Tzung-Pei Hong,et al.  An efficient projection-based indexing approach for mining high utility itemsets , 2012, Knowledge and Information Systems.

[39]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[40]  Ho-Jin Choi,et al.  A framework for mining interesting high utility patterns with a strong frequency affinity , 2011, Inf. Sci..

[41]  Tzung-Pei Hong,et al.  Mining high average-utility itemsets , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.