A Dynamic Load Balancing Scheme for Distributed Formal Concept Analysis

Formal Concept Analysis (FCA) finds applications in several areas including data mining, artificial intelligence, and software engineering. FCA algorithms are computationally expensive and their recursion tree has an irregular structure. Several parallel algorithms have been implemented to manage the computational complexity of FCA. Most of them assume a shared memory environment where they maintain a shared queue of computational tasks and the workers store and retrieve tasks from that queue. Although the shared queue approach addresses the computation skew by fine grained sharing, it causes communication bottlenecks in a distributed memory environment. In this work, we propose static and dynamic load balancing strategies that are applicable in distributed memory environment. We parallelize the FCA algorithm called Linear time Closed itemset Miner and show that the proposed load balancing strategies effectively deal with the computation skew. They not only distribute the load evenly among the workers but also minimize the communication overhead.

[1]  Alexandre Termier,et al.  Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses , 2010, 2010 International Conference on High Performance Computing & Simulation.

[2]  Jonas Poelmans,et al.  Formal Concept Analysis in knowledge processing: A survey on models and techniques , 2013, Expert Syst. Appl..

[3]  Saurabh Dighe,et al.  The 48-core SCC Processor: the Programmer's View , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Jonas Poelmans,et al.  Formal Concept Analysis in Knowledge Discovery: A Survey , 2010, ICCS.

[5]  Koji Tsuda,et al.  Redesigning pattern mining algorithms for supercomputers , 2015, ArXiv.

[6]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[7]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[8]  Bernhard Ganter,et al.  Two Basic Algorithms in Concept Analysis , 2010, ICFCA.

[9]  Hiroki Arimura,et al.  An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases , 2004, Discovery Science.

[10]  Mark Baker,et al.  Nested parallelism for multi-core HPC systems using Java , 2009, J. Parallel Distributed Comput..

[11]  Ruairí de Fréin,et al.  Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduce Framework , 2012, ICFCA.

[12]  Salvatore Orlando,et al.  Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[13]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[14]  Vilém Vychodil,et al.  Distributed Algorithm for Computing Formal Concepts Using Map-Reduce Framework , 2009, IDA.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Banshi Dhar Chaudhary,et al.  Concept Discovery from Un-Constrained Distributed Context , 2015, BDA.

[17]  Vilém Vychodil,et al.  Advances in Algorithms Based on CbO , 2010, CLA.

[18]  Sriram Krishnamoorthy,et al.  Lifeline-based global load balancing , 2011, PPoPP '11.

[19]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[20]  Sergei O. Kuznetsov,et al.  Learning of Simple Conceptual Graphs from Positive and Negative Examples , 1999, PKDD.