Fault Tolerant Frequent Pattern Mining

FP-Growth algorithm is a Frequent Pattern Mining (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing, though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.

[1]  Srinivasan Parthasarathy,et al.  Parallel Algorithms for Discovery of Association Rules , 1997, Data Mining and Knowledge Discovery.

[2]  B R de Supinski,et al.  Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System , 2010 .

[3]  Frank Mueller,et al.  Affinity-aware checkpoint restart , 2014, Middleware.

[4]  Sudhanva Gurumurthi,et al.  Feng Shui of supercomputer memory positional effects in DRAM and SRAM faults , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  William Gropp,et al.  MPI-2: Extending the Message-Passing Interface , 1996, Euro-Par, Vol. I.

[6]  Daniel Marques,et al.  Automated application-level checkpointing of MPI programs , 2003, PPoPP '03.

[7]  Matei Zaharia,et al.  Resilient Distributed Datasets , 2016 .

[8]  Masaru Kitsuregawa,et al.  Parallel FP-Growth on PC Cluster , 2003, PAKDD.

[9]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[10]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[11]  Rekha Sharma,et al.  A Novel Algorithm PDA (Parallel And Distributed Apriori) for Frequent Pattern Mining , 2014 .

[12]  Khushbu Agarwal,et al.  Large Scale Frequent Pattern Mining Using MPI One-Sided Model , 2015, 2015 IEEE International Conference on Cluster Computing.

[13]  Bronis R. de Supinski,et al.  Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Sameh Mohamed Shohdy Ahmed Abdulah,et al.  Addressing Disk Bandwidth Wall and Fault-Tolerance for Data-intensive Applications , 2016 .

[15]  Torsten Hoefler,et al.  Towards Efficient MapReduce Using MPI , 2009, PVM/MPI.

[16]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[17]  Torsten Hoefler,et al.  Fault tolerance for remote memory access programming models , 2014, HPDC '14.

[18]  Uday V. Kulkarni,et al.  Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Framework , 2013 .

[19]  Jing Xu,et al.  Efficient Probabilistic Frequent Itemset Mining in Big Sparse Uncertain Data , 2014, PRICAI.

[20]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[21]  Chih-Ping Chu,et al.  Determining the appropriate number of nodes for fast mining of frequent patterns in distributed computing environments , 2015, Int. J. Parallel Emergent Distributed Syst..

[22]  Abhinav Vishnu,et al.  A Case for Soft Error Detection and Correction in Computational Chemistry. , 2013, Journal of chemical theory and computation.

[23]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[24]  Dinesh Vaghela,et al.  Mining Distributed Frequent Itemset with Hadoop , 2014 .

[25]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[26]  Bronis R. de Supinski,et al.  Soft error vulnerability of iterative linear algebra methods , 2007, ICS '08.

[27]  Bingsheng He,et al.  Frequent itemset mining on graphics processors , 2009, DaMoN '09.

[28]  Zizhong Chen Algorithm-based recovery for iterative methods without checkpointing , 2011, HPDC '11.

[29]  Jin Chang,et al.  Balanced parallel FP-Growth with MapReduce , 2010, 2010 IEEE Youth Conference on Information, Computing and Telecommunications.

[30]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[31]  Kun-Ming Yu,et al.  An Efficient Load Balancing Multi-core Frequent Patterns Mining Algorithm , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[32]  Abhinav Vishnu,et al.  Fault Tolerant Support Vector Machines , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[33]  Srinivasan Parthasarathy,et al.  Out-of-core frequent pattern mining on a commodity PC , 2006, KDD '06.

[34]  Padma Raghavan,et al.  Characterizing the impact of soft errors on iterative methods in scientific computing , 2011, ICS '11.

[35]  Shirish Tatikonda,et al.  Toward terabyte pattern mining: an architecture-conscious solution , 2007, PPoPP.

[36]  Vilas Sridharan,et al.  A study of DRAM failures in the field , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[38]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[39]  Dhiraj K. Pradhan,et al.  Algorithm Level Fault Tolerance: A Technique to Cope with Long Duration Transient Faults in Matrix Multiplication Algorithms , 2008, 26th IEEE VLSI Test Symposium (vts 2008).