Parallelization of Extracting Connected Subgraphs with Common Itemsets in Distributed Memory Environments

This paper proposes a parallel implementation of graph mining that extracts all connected subgraphs with common itemsets, of which the size is not less than a given threshold, from a graph and from itemsets associated with vertices of the graph, in distributed memory environments using the task-parallel language Tascell. With regard to this problem, we have already proposed parallelization of a backtrack search algorithm named COPINE and its implementation in shared memory environments. In this implementation, all workers share a single table, which is controlled by locks, that contains the knowledge acquired during the search to obviate the need for unnecessary searching. This sharing method is not practical in distributed memory environments because it would lead to a drastic increase in the cost of internode communications. Therefore, we implemented a sharing method in which each computing node has a table and sends its updates to the other nodes at regular time intervals. In addition to this, the high task creation cost for COPINE is problematic and thus the conventional work-stealing strategy in Tascell, which aims to minimize the number of internode work-steals, significantly degrades the performance since it increases the number of intranode work-steals for small tasks. We solved this problem by promoting workers to enable them to request tasks from external nodes. We also employed a work-stealing strategy based on estimation of the sizes of tasks created by victim workers. This approach enabled us to achieve good speedup performance with up to 8 nodes × 16 workers.

[1]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Nachum Dershowitz,et al.  Parallel Multithreaded Satisfiability Solver: Design and Implementation , 2005, PDMC.

[3]  Taiichi Yuasa,et al.  Backtracking-based load balancing , 2009, PPoPP '09.

[4]  Jun Sese,et al.  Mining networks with shared items , 2010, CIKM.

[5]  Hiroshi Nakashima,et al.  Parallelization of Extracting Connected Subgraphs with Common Itemsets , 2014 .

[6]  Kenjiro Taura,et al.  Uni-Address Threads: Scalable Thread Management for RDMA-Based Work Stealing , 2015, HPDC.

[7]  Keith H. Randall,et al.  Cilk: efficient multithreaded computing , 1998 .

[8]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[9]  Hiroshi Nakashima,et al.  Reducing Redundant Search in Parallel Graph Mining Using Exceptions , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  Jun Sese,et al.  Identification of active biological networks and common expression conditions , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[11]  Wolfgang Küchlin,et al.  PaSAT - Parallel SAT-Checking with Lemma Exchange: Implementation and Applications , 2001, Electron. Notes Discret. Math..

[12]  Jonathan Schaeffer,et al.  Transposition Table Driven Work Scheduling in Distributed Game-Tree Search , 2002, Canadian Conference on AI.

[13]  馬谷 誠二,et al.  Evaluation of the Tascell Dynamic Load Balancing Framework in Widely Distributed and Many-Core Environments , 2011 .

[14]  Brad Calder,et al.  Leapfrogging: a portable technique for implementing efficient futures , 1993, PPOPP '93.

[15]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  David Grove,et al.  GLB: lifeline-based global load balancing library in x10 , 2013, PPAA '14.

[17]  Weng-Fai Wong,et al.  SilkRoad: a multithreaded runtime system with software distributed shared memory for SMP clusters , 2000, Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000.