Flexible Pattern Discovery and Analysis

Based on the analysis of the proportion of utility in the supporting transactions used in the field of data mining, high utility-occupancy pattern mining (HUOPM) has recently attracted widespread attention. Unlike high-utility pattern mining (HUPM), which involves the enumeration of high-utility (e.g., profitable) patterns, HUOPM aims to find patterns representing a collection of existing transactions. In practical applications, however, not all patterns are used or valuable. For example, a pattern might contain too many items, that is, the pattern might be too specific and therefore lack value for users in real life. To achieve qualified patterns with a flexible length, we constrain the minimum and maximum lengths during the mining process and introduce a novel algorithm for the mining of flexible high utility-occupancy patterns. Our algorithm is referred to as HUOPM. To ensure the flexibility of the patterns and tighten the upper bound of the utility-occupancy, a strategy called the length upper-bound (LUB) is presented to prune the search space. In addition, a utility-occupancy nested list (UO-nlist) and a frequency-utility-occupancy table (FUO-table) are employed to avoid multiple scans of the database. Evaluation results of the subsequent experiments confirm that the proposed algorithm can effectively control the length of the derived patterns, for both real-world and synthetic datasets. Moreover, it can decrease the execution time and memory consumption.

[1]  Xindong Wu,et al.  NTP-Miner: Nonoverlapping Three-Way Sequential Pattern Mining , 2021, ACM Trans. Knowl. Discov. Data.

[2]  Philip S. Yu,et al.  Online mining of temporal maximal utility itemsets from data streams , 2010, SAC '10.

[3]  Howard J. Hamilton,et al.  Mining itemset utilities from transaction databases , 2006, Data Knowl. Eng..

[4]  Philip S. Yu,et al.  ProUM: Projection-based Utility Mining on Sequence Data , 2019, Inf. Sci..

[5]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[6]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[7]  Qiang Yang,et al.  Objective-oriented utility-based association mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[8]  Philip S. Yu,et al.  HUOPM: High-Utility Occupancy Pattern Mining , 2018, IEEE Transactions on Cybernetics.

[9]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[10]  Unil Yun,et al.  Efficient approach for incremental high utility pattern mining with indexed list structure , 2019, Future Gener. Comput. Syst..

[11]  Hiroki Arimura,et al.  Mining Maximal Flexible Patterns in a Sequence , 2007, JSAI.

[12]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[13]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[14]  Philippe Fournier-Viger,et al.  FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits , 2016, Knowl. Based Syst..

[15]  Tzung-Pei Hong,et al.  An Incremental High-Utility Mining Algorithm with Transaction Insertion , 2015, TheScientificWorldJournal.

[16]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[17]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[19]  Weimin Zheng,et al.  OCEAN: Fast Discovery of High Utility Occupancy Itemsets , 2016, PAKDD.

[20]  Ping Luo,et al.  Incorporating occupancy into frequent pattern mining for high quality pattern recommendation , 2012, CIKM.

[21]  Jian Pei,et al.  Constrained frequent pattern mining: a pattern-growth view , 2002, SKDD.

[22]  Tzung-Pei Hong,et al.  Efficient algorithms for mining up-to-date high-utility patterns , 2015, Adv. Eng. Informatics.

[23]  Howard J. Hamilton,et al.  A Unified Framework for Utility Based Measures for Mining Itemsets , 2006 .

[24]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[25]  Young-Koo Lee,et al.  HUC-Prune: an efficient candidate pruning technique to mine high utility patterns , 2011, Applied Intelligence.

[26]  Chin-Chen Chang,et al.  Isolated items discarding strategy for discovering high utility itemsets , 2008, Data Knowl. Eng..

[27]  Vincent S. Tseng,et al.  Mining high-utility itemsets in dynamic profit databases , 2019, Knowl. Based Syst..

[28]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[29]  Philip S. Yu,et al.  A Survey of Utility-Oriented Pattern Mining , 2018, IEEE Transactions on Knowledge and Data Engineering.

[30]  Philip S. Yu,et al.  Fast Utility Mining on Sequence Data , 2020, IEEE Transactions on Cybernetics.

[31]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[32]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[33]  Cory J. Butz,et al.  A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[34]  Tzung-Pei Hong,et al.  Efficient algorithms for mining high-utility itemsets in uncertain databases , 2016, Knowl. Based Syst..

[35]  Chien-Ming Chen,et al.  Discovering High Utility-Occupancy Patterns from Uncertain Data , 2021, Inf. Sci..

[36]  Philippe Fournier-Viger,et al.  FHM + : Faster High-Utility Itemset Mining Using Length Upper-Bound Reduction , 2016, IEA/AIE.

[37]  Bart Goethals,et al.  Survey on Frequent Pattern Mining , 2003 .

[38]  Aijun An,et al.  Mining significant high utility gene regulation sequential patterns , 2017, BMC Systems Biology.

[39]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[40]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.