A novel cooperative accelerated parallel two-list algorithm for solving the subset-sum problem on a hybrid CPU-GPU cluster

Many parallel algorithms have recently been developed to accelerate solving the subset-sum problem on a heterogeneous CPU-GPU system. However, within each compute node, only one CPU core is used to control one GPU and all the remaining CPU cores are in idle state, which leads to a large number of CPU cores being wasted. In this paper, based on a cost-optimal parallel two-list algorithm, we propose a novel heterogeneous cooperative computing approach to solve the subset-sum problem on a hybrid CPU-GPU cluster, which can make full use of all available computational resources of a cluster. The unbalanced workload distribution and the huge communication overhead are two main obstacles for the heterogeneous cooperative computing. In order to assign the most suitable workload to each compute node and reasonably partition it between CPU and GPU within each node, and minimize the inter-node and intra-node communication costs, we design a communication-avoiding workload distribution scheme suitable for the parallel two-list algorithm. According to this scheme, we provide an efficient heterogeneous cooperative implementation of the algorithm. A series of experiments are conducted on a hybrid CPU-GPU cluster, where each node has two 6-core CPUs and one GPU. The results show that the heterogeneous cooperative computing significantly outperforms the CPU-only or GPU-only computing. A novel cooperative accelerated parallel two-list algorithm for solving SSP is explored.A heterogeneous cooperative computing approach for CPU-GPU clusters is proposed.A communication-avoiding workload distribution scheme suitable for two-list algorithm is designed.An efficient heterogeneous cooperative implementation of two-list algorithm is provided.

[1]  Nei Yoshihiro Soma,et al.  An optimal and scalable parallelization of the two-list , 2007, Eur. J. Oper. Res..

[2]  Kenli Li,et al.  Optimal parallel algorithms for the knapsack problem without memory conflicts , 2008, Journal of Computer Science and Technology.

[3]  Tao Tang,et al.  Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system , 2013, J. Parallel Distributed Comput..

[4]  Saniyah S. Bokhari Parallel solution of the subset‐sum problem: an empirical study , 2012, Concurr. Comput. Pract. Exp..

[5]  Kenli Li,et al.  Efficient Parallelization of a Two-List Algorithm for the Subset-Sum Problem on a Hybrid CPU/GPU Cluster , 2014, 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming.

[6]  Chin-Chen Chang,et al.  A Parallel Two-List Algorithm for the Knapsack Problem , 1997, Parallel Comput..

[7]  Weichung Wang,et al.  Adaptive block size for dense QR factorization in hybrid CPU-GPU systems via statistical modeling , 2014, Parallel Comput..

[8]  Didier El Baz,et al.  Dense Dynamic Programming on Multi GPU , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[9]  Kenli Li,et al.  A Novel CPU-GPU Cooperative Implementation of A Parallel Two-List Algorithm for the Subset-Sum Problem , 2014, PMAM'14.

[10]  Imen Chakroun,et al.  Combining multi-core and GPU computing for solving combinatorial optimization problems , 2013, J. Parallel Distributed Comput..

[11]  Ellis Horowitz,et al.  Computing Partitions with Applications to the Knapsack Problem , 1974, JACM.

[12]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[13]  Fouad B. Chedid An optimal parallelization of the two-list algorithm of cost O(2n/2) , 2008, Parallel Comput..

[14]  Didier El Baz,et al.  Solving knapsack problems on GPU , 2012, Comput. Oper. Res..

[15]  Ehud D. Karnin,et al.  A Parallel Algorithm for the Knapsack Problem , 1984, IEEE Transactions on Computers.

[16]  Petr Pospichal,et al.  Parallel Genetic Algorithm Solving 0/1 Knapsack Problem Running on the GPU , 2011 .

[17]  Canqun Yang,et al.  Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer , 2011, Journal of Computer Science and Technology.

[18]  Martin K. Purvis,et al.  GPU as a General Purpose Computing Resource , 2008, 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies.

[19]  Anton J. Kleywegt,et al.  The Dynamic and Stochastic Knapsack Problem with Random Sized Items , 2001, Oper. Res..

[20]  Didier El Baz,et al.  GPU Implementation of the Branch and Bound Method for Knapsack Problems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[21]  Kenli Li,et al.  A cost-optimal parallel algorithm for the 0-1 knapsack problem and its performance on multicore CPU and GPU implementations , 2015, Parallel Comput..

[22]  Kenli Li,et al.  GPU implementation of a parallel two‐list algorithm for the subset‐sum problem , 2015, Concurr. Comput. Pract. Exp..

[23]  Jirí Jaros,et al.  Multi-GPU island-based genetic algorithm for solving the knapsack problem , 2012, 2012 IEEE Congress on Evolutionary Computation.

[24]  Jack J. Dongarra,et al.  Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..

[25]  Selim G. Akl,et al.  Optimal Parallel Merging and Sorting Without Memory Conflicts , 1987, IEEE Transactions on Computers.

[26]  Jack J. Dongarra,et al.  Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems , 2012, ICS '12.