Parallelization of Graph Mining using Backtrack Search Algorithm

This paper describes parallel implementations of a highly complex graph mining problem to extract Common Itemset connected subGraphs (CIGs) from a graph whose vertices are labeled by their own sets of items, or itemsets in short. The problem is to extract all connected subgraphs in a given graph, each of which satisfies that the cardinality of its common itemset, i.e., the intersection of the itemsets of all of its vertices, is not less than a given threshold. Our implementations are to parallelize this mining problem for both shared and distributed memory environments. This kind of graph mining can be applied to the analysis of social and biological networks. An efficient sequential backtrack search algorithm named COPINE has already been proposed for this problem. COPINE avoids unnecessary searches using a pruning mechanism that depends on the knowledge acquired during the leftand depth-first traversal of its search tree. In a parallel search where a unique set of subtrees is assigned to each worker as its task, the branches in the subtrees must be pruned as well for efficiently reducing the search space, but not excessively by a blind consultation of the knowledge acquired by another worker. To avoid such excessive pruning, we found a restriction imposed on the workers when referring to the knowledge acquired by other workers. In consideration of this restriction, we designed a parallel extension of COPINE. Because the search trees in COPINE have an irregular structure, dynamic load balancing should be applied in parallelized implementations. Applications with these properties are often implemented by task-parallel languages, by which we can dynamically spawn tasks to be automatically assigned to workers as parallel threads and/or processes. We implemented the parallel COPINE algorithm using the task-parallel language Tascell, which offers high performance in various backtrack search algorithms. The parallel COPINE algorithm requires workers to share the acquired knowledge for the pruning mechanism. For shared memory environments, we implemented a sharing method in which a single table controlled by locks is shared among all workers. This method enables workers to refer to the knowledge acquired by another worker immediately. In addition, we proposed a task creation strategy whereby useful pruning knowledge can be acquired as early as possible in a parallel search. On the implementation of this algorithm in distributed memory environments, we should consider the cost of internode communication. The sharing method shown above is impractical in distributed memory environments because internode communication is required every time a worker acquires new knowledge. Therefore, we also implemented a sharing method in which each computing node manages its own table and sends table updates to other nodes periodically. Furthermore, the conventional work-stealing strategy in Tascell, which aims to minimize the number of internode work-steals, could cause a load imbalance by increasing the number of intranode work-steals for small tasks. We solved this problem by implementing new workstealing strategies in Tascell to enable workers to obtain larger tasks.

[1]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2]  馬谷 誠二,et al.  Evaluation of the Tascell Dynamic Load Balancing Framework in Widely Distributed and Many-Core Environments , 2011 .

[3]  Brad Calder,et al.  Leapfrogging: a portable technique for implementing efficient futures , 1993, PPOPP '93.

[4]  Richard M. Stallman,et al.  Using The Gnu Compiler Collection: A Gnu Manual For Gcc Version 4.3.3 , 2009 .

[5]  Taiichi Yuasa,et al.  A Transformation-Based Implementation of Lightweight Nested Functions , 2006 .

[6]  Masahiro Yasugi Hierarchically Structured Synchronization and Exception Handling in Parallel Languages using Dynamic Scope , 1999 .

[7]  Hiroshi Nakashima,et al.  Reducing Redundant Search in Parallel Graph Mining Using Exceptions , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[8]  Hiroki Arimura,et al.  LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets , 2003, FIMI.

[9]  Weng-Fai Wong,et al.  SilkRoad: a multithreaded runtime system with software distributed shared memory for SMP clusters , 2000, Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000.

[10]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[11]  Donald E. Knuth,et al.  The Solution for the Branching Factor of the Alpha-Beta Pruning Algorithm , 1981, ICALP.

[12]  Taiichi Yuasa,et al.  Experience with SC: transformation-based implementation of various extensions to C , 2007, ILC.

[13]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[14]  Charles E. Leiserson,et al.  Programming with exceptions in JCilk , 2006, Sci. Comput. Program..

[15]  Bradley C. Kuszmaul,et al.  Massively Parallel Chess , 1994 .

[16]  Hiraishi Tasuku,et al.  Reducing Invocation Costs of L-Closures , 2013 .

[17]  Jun Sese,et al.  Identification of active biological networks and common expression conditions , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[18]  Wolfgang Küchlin,et al.  PaSAT - Parallel SAT-Checking with Lemma Exchange: Implementation and Applications , 2001, Electron. Notes Discret. Math..

[19]  Jonathan Schaeffer,et al.  Transposition Table Driven Work Scheduling in Distributed Game-Tree Search , 2002, Canadian Conference on AI.

[20]  Bernd Becker,et al.  Multithreaded SAT Solving , 2007, 2007 Asia and South Pacific Design Automation Conference.

[21]  Taiichi Yuasa,et al.  Lightweight Lexical Closures for Legitimate Execution Stack Access , 2006, CC.

[22]  Youssef Hamadi,et al.  Seven Challenges in Parallel SAT Solving , 2012, AI Mag..

[23]  R. Kent Dybvig,et al.  Revised5 Report on the Algorithmic Language Scheme , 1986, SIGP.

[24]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[25]  Nachum Dershowitz,et al.  Parallel Multithreaded Satisfiability Solver: Design and Implementation , 2005, PDMC.

[26]  Taiichi Yuasa,et al.  Backtracking-based load balancing , 2009, PPoPP '09.

[27]  John H. Reppy,et al.  Concurrent programming in ML , 1999 .

[28]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Laxmikant V. Kalé,et al.  Controlling Concurrency and Expressing Synchronization in Charm++ Programs , 2014, Concurrent Objects and Beyond.

[30]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[31]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[32]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[33]  Daniel Singer Parallel Resolution of the Satisfiability Problem: A Survey , 2006 .

[34]  Guy E. Blelloch,et al.  Scheduling threads for constructive cache sharing on CMPs , 2007, SPAA '07.

[35]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[36]  Lars Bergstrom,et al.  Programming in Manticore, a Heterogenous Parallel Functional Language , 2009, CEFP.

[37]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[38]  Laxmikant V. Kalé,et al.  Adaptive MPI , 2003, LCPC.

[39]  Vasco M. Manquinho,et al.  An overview of parallel SAT solving , 2012, Constraints.

[40]  Gabriel Antoniu,et al.  An Efficient and Transparent Thread Migration Scheme in the PM2 Runtime System , 1999, IPPS/SPDP Workshops.

[41]  Guy L. Steele,et al.  The Java Language Specification, Java SE 8 Edition , 2013 .

[42]  John H. Reppy,et al.  Manticore: a heterogeneous parallel language , 2007, DAMP '07.

[43]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[44]  Thomas M. Breuel Lexical Closures for C++ , 1988, C++ Conference.

[45]  Kenjiro Taura,et al.  Uni-Address Threads: Scalable Thread Management for RDMA-Based Work Stealing , 2015, HPDC.

[46]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[47]  Keith H. Randall,et al.  Cilk: efficient multithreaded computing , 1998 .

[48]  Jun Sese,et al.  Mining networks with shared items , 2010, CIKM.

[49]  Guy E. Blelloch,et al.  Provably efficient scheduling for languages with fine-grained parallelism , 1999, JACM.

[50]  Guy E. Blelloch,et al.  Parallel depth first vs. work stealing schedulers on CMP architectures , 2006, SPAA '06.

[51]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[52]  David Grove,et al.  GLB: lifeline-based global load balancing library in x10 , 2013, PPAA '14.