Efficient Parallel Mining of High-utility Itemsets on Multicore Processors

High-utility itemset mining is a generalized problem of well-known frequent itemset mining, which considers not only the frequency of occurrence but also quantitative criteria such as unit profit. Because it can be applied to a wider spectrum of knowledge discovery work, various algorithmic improvements have been studied over the past two decades. On the other hand, limited efforts have been made to take advantage of hardware performance despite significant changes in hardware trends. This paper presents a novel parallelization method called DPHIM (Dynamic Parallelization for High-utility Itemset Mining). DPHIM dynamically decomposes the execution of high-utility itemset mining into subtasks in order to leverage logical data parallelism, and carefully assigns the subtasks and their related data to physical resources such as processing cores and nearby memory in the NUMA-aware manner. Our intensive and extensive experiments have confirmed that DPHIM performs up to 65.23 times faster than the fully-tuned serial execution, up to 23.54 times faster than static partitioning, and up to 2.51 times faster than the best case of alternative dynamic parallel executions for a variety of datasets and configurations on DRAM. As well, we have demonstrated that DPHIM effectively worked on persistent memory; it offered similar thread scalability trends and was 1.07 to 2.43 times slower on persistent memory.

[1]  Paolo Romano,et al.  Persistent Memory , 2021, ACM Comput. Surv..

[2]  Gautam Srivastava,et al.  Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework , 2021, Inf. Sci..

[3]  Cheng-Wei Wu,et al.  Distributed Mining of Spatial High Utility Itemsets in Very Large Spatiotemporal Databases using Spark In-Memory Computing Architecture , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[4]  Gautam Srivastava,et al.  High-Utility Pattern Mining in Hadoop Environments , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[5]  Gautam Srivastava,et al.  Fuzzy High-Utility Pattern Mining based on the Hadoop Framework , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[6]  Koji Zettsu,et al.  Discovering Frequent Spatial Patterns in Very Large Spatiotemporal Databases , 2020, SIGSPATIAL/GIS.

[7]  Sumalatha Saleti,et al.  Distributed mining of high utility time interval sequential patterns using mapreduce approach , 2020, Expert Syst. Appl..

[8]  Adrian Jackson,et al.  An early evaluation of Intel's optane DC persistent memory module and its impact on high-performance scientific applications , 2019, SC.

[9]  Jimmy Ming-Tai Wu,et al.  High-Utility Itemset Mining with Effective Pruning Strategies , 2019, ACM Trans. Knowl. Discov. Data.

[10]  Masaru Kitsuregawa,et al.  Discovering Partial Periodic High Utility Itemsets in Temporal Databases , 2019, DEXA.

[11]  Steven Swanson,et al.  An Empirical Guide to the Behavior and Use of Scalable Persistent Memory , 2019, FAST.

[12]  Chunkai Zhang,et al.  An Efficient Parallel High Utility Sequential Pattern Mining Algorithm , 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[13]  Sebastián Ventura,et al.  Frequent itemset mining: A 25 years review , 2019, WIREs Data Mining Knowl. Discov..

[14]  Junqiang Liu,et al.  Efficient Parallel Algorithm for Mining High Utility Patterns Based on Spark , 2019, 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC).

[15]  Vincent S. Tseng,et al.  Parallel Mining of Top-k High Utility Itemsets in Spark In-Memory Computing Architecture , 2019, PAKDD.

[16]  Dharavath Ramesh,et al.  Parallel High Average-Utility Itemset Mining Using Better Search Space Division Approach , 2018, ICDCIT.

[17]  Hoai Bac Le,et al.  A pure array structure and parallel strategy for high-utility sequential pattern mining , 2018, Expert Syst. Appl..

[18]  Srikumar Krishnamoorthy,et al.  HMiner: Efficiently mining high utility itemsets , 2017, Expert Syst. Appl..

[19]  Aijun An,et al.  Mining significant high utility gene regulation sequential patterns , 2017, BMC Systems Biology.

[20]  Sushil K. Prasad,et al.  Distributed Algorithm for High-Utility Subgraph Pattern Mining Over Big Data Platforms , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).

[21]  Yonggang Hu,et al.  Distributed and parallel high utility sequential pattern mining , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[22]  Aijun An,et al.  Approximate Parallel High Utility Itemset Mining , 2016, Big Data Res..

[23]  Wei Song,et al.  Discovering high utility itemset using MapReduce , 2016, 2016 3rd International Conference on Systems and Informatics (ICSAI).

[24]  Eric P. Xing,et al.  Addressing the straggler problem for iterative convergent parallel ML , 2016, SoCC.

[25]  Karine Heydemann,et al.  Scalable task parallelism for NUMA: A uniform abstraction for coordinated scheduling and memory management , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[26]  Philippe Fournier-Viger,et al.  PHM: Mining Periodic High-Utility Itemsets , 2016, ICDM.

[27]  Vincent S. Tseng,et al.  EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining , 2015, MICAI.

[28]  Vincent S. Tseng,et al.  Efficient Mining of High-Utility Sequential Rules , 2015, MLDM.

[29]  Srikumar Krishnamoorthy,et al.  Pruning strategies for mining high utility itemsets , 2015, Expert Syst. Appl..

[30]  Karine Heydemann,et al.  Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages , 2014, ACM Trans. Archit. Code Optim..

[31]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[32]  Vincent S. Tseng,et al.  Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments , 2013, Knowledge and Information Systems.

[33]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[34]  Lin Feng,et al.  UT-Tree: Efficient mining of high utility itemsets from data streams , 2013, Intell. Data Anal..

[35]  Maurice Herlihy,et al.  Using Elimination and Delegation to Implement a Scalable NUMA-Friendly Stack , 2013, HotPar.

[36]  Nir Shavit,et al.  NUMA-aware reader-writer locks , 2013, PPoPP '13.

[37]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[38]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.

[39]  Tzung-Pei Hong,et al.  AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING , 2012 .

[40]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[41]  Virendra J. Marathe,et al.  Lock cohorting: a general technique for designing NUMA locks , 2012, PPoPP '12.

[42]  Tzung-Pei Hong,et al.  An effective tree structure for mining high utility itemsets , 2011, Expert Syst. Appl..

[43]  Thomas R. Gross,et al.  Memory system performance in a NUMA multicore multiprocessor , 2011, SYSTOR '11.

[44]  Philip S. Yu,et al.  Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments , 2011, DASFAA.

[45]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[46]  Philip S. Yu,et al.  Online mining of temporal maximal utility itemsets from data streams , 2010, SAC '10.

[47]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[48]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[49]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[50]  Raj P. Gopalan,et al.  Efficient Mining of High Utility Itemsets from Large Datasets , 2008, PAKDD.

[51]  Raj P. Gopalan,et al.  CTU-Mine: An Efficient High Utility Itemset Mining Algorithm Using the Pattern Growth Approach , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[52]  Shekhar Y. Borkar,et al.  Thousand Core ChipsA Technology Perspective , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[53]  Joseph Antony,et al.  Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport , 2006, HiPC.

[54]  Chin-Chen Chang,et al.  Direct Candidates Generation: A Novel Algorithm for Discovering Complete Share-Frequent Itemsets , 2005, FSKD.

[55]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[56]  M. Steinbach,et al.  Introduction to Data Mining , 2005, Principles of Data Mining.

[57]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[58]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[59]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[60]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[61]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[62]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[63]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[64]  Eduard Ayguadé,et al.  Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors , 1999, ICS '99.

[65]  Bil Lewis,et al.  Multithreaded Programming With PThreads , 1997 .

[66]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[67]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[68]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[69]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[70]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[71]  Melvin E. Conway,et al.  Design of a separable transition-diagram compiler , 1963, CACM.

[72]  Sumalatha Saleti,et al.  Distributed Mining of High Utility Time Interval Sequential Patterns with Multiple Minimum Utility Thresholds , 2021, IEA/AIE.

[73]  Jinkyu Jeong,et al.  A Performance-Stable NUMA Management Scheme for Linux-Based HPC Systems , 2021, IEEE Access.

[74]  P. Romano,et al.  Persistent Memory , 2021, ACM Comput. Surv..

[75]  Jerry Chun-Wei Lin,et al.  A Survey of High Utility Itemset Mining , 2019, Studies in Big Data.

[76]  Roberto Palmieri,et al.  NUMASK: High Performance Scalable Skip List for NUMA , 2018, DISC.

[77]  Hoang Thanh Lam,et al.  The SPMF Open-Source Data Mining Library , 2016 .

[78]  Tzung-Pei Hong,et al.  An efficient projection-based indexing approach for mining high utility itemsets , 2012, Knowledge and Information Systems.

[79]  Hiroshi Yoshida,et al.  Storage Networking Industry Association , 2009, Encyclopedia of Database Systems.

[80]  Chin-Chen Chang,et al.  Isolated items discarding strategy for discovering high utility itemsets , 2008, Data Knowl. Eng..

[81]  Howard J. Hamilton,et al.  A Unified Framework for Utility Based Measures for Mining Itemsets , 2006 .

[82]  Cory J. Butz,et al.  A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[83]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[84]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .