论文信息 - An efficient algorithm for mining high utility itemsets with negative item values in large databases

An efficient algorithm for mining high utility itemsets with negative item values in large databases

Utility itemsets typically consist of items with different values such as utilities, and the aim of utility mining is to identify the itemsets with highest utilities. In the past studies on utility mining, the values of utility itemsets were considered as positive. In some applications, however, an itemset may be associated with negative item values. Hence, discovery of high utility itemsets with negative item values is important for mining interesting patterns like association rules. In this paper, we propose a novel method, namely HUINIV (High Utility Itemsets with Negative Item Values)-Mine, for efficiently and effectively mining high utility itemsets from large databases with consideration of negative item values. To the best of our knowledge, this is the first work that considers the concept of negative item values in utility mining. The novel contribution of HUINIV-Mine is that it can effectively identify high utility itemsets by generating fewer high transaction-weighted utilization itemsets such that the execution time can be reduced substantially in mining the high utility itemsets. In this way, the process of discovering all high utility itemsets with consideration of negative item values can be accomplished effectively with less requirements on memory space and CPU I/O. This meets the critical requirements of temporal and spatial efficiency for mining high utility itemsets with negative item values. Through experimental evaluation, it is shown that HUINIV-Mine outperforms other methods substantially by generating much less candidate itemsets under different experimental conditions.

[1] Heikki Mannila,et al. Rule Discovery from Time Series , 1998, KDD.

[2] Sushil Jajodia,et al. Testing complex temporal relationships involving multiple granularities and its application to data mining (extended abstract) , 1996, PODS.

[3] Cory J. Butz,et al. A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[4] Ming-Syan Chen,et al. Sliding-window filtering: an efficient algorithm for incremental mining , 2001, CIKM '01.

[5] Rajeev Motwani,et al. Approximate Frequency Counts over Data Streams , 2012, VLDB.

[6] Jun-Lin Lin,et al. Mining association rules: anti-skew algorithms , 1998, Proceedings 14th International Conference on Data Engineering.

[7] Vincent S. Tseng,et al. Efficient Mining of Temporal High Utility Itemsets from Data streams , 2006 .

[8] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9] Necip Fazil Ayan,et al. An efficient algorithm to update large itemsets with early pruning , 1999, KDD '99.

[10] David Wai-Lok Cheung,et al. A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[11] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[12] Ke Chen,et al. Applied Mathematics and Computation , 2022 .

[13] Philip S. Yu,et al. Resource-Aware Mining with Variable Granularities in Data Streams , 2004, SDM.

[14] A. Choudhary,et al. A fast high utility itemsets mining algorithm , 2005, UBDM '05.

[15] Philip S. Yu,et al. Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[16] Qiang Yang,et al. Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[17] Jiawei Han,et al. Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[18] Heikki Mannila,et al. Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[19] Shamkant B. Navathe,et al. An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[20] Philip S. Yu,et al. Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[21] Philip S. Yu,et al. A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.