A GPU-based Branch-and-Bound algorithm using Integer-Vector-Matrix data structure

First Branch-and-Bound (B&B) algorithm entirely deployed on GPU.B&B-tree management with IntegerVectorMatrix data structure instead of linked-list.3.3 times faster than conventional GPU-accelerated B&B based on a linked-list.Branch divergence reduction for B&B on GPU.Work stealing for parallel B&B on GPU. Branch-and-Bound (B&B) algorithms are tree-based exploratory methods for solving combinatorial optimization problems exactly to optimality. These problems are often large in size and known to be NP-hard to solve. The construction and exploration of the B&B-tree are performed using four operators: branching, bounding, selection and pruning. Such algorithms are irregular which makes their parallel design and implementation on GPU challenging. Existing GPU-accelerated B&B algorithms perform only a part of the algorithm on the GPU and rely on the transfer of pools of subproblems across the PCI Express bus to the device. To the best of our knowledge, the algorithm presented in this paper is the first GPU-based B&B algorithm that performs all four operators on the device and subsequently avoids the data transfer bottleneck between CPU and GPU. The implementation on GPU is based on the IntegerVectorMatrix (IVM) data structure which is used instead of a conventional linked-list to store and manage the pool of subproblems. This paper revisits the IVM-based B&B algorithm on the GPU, addressing the irregularity of the algorithm in terms of workload, memory access patterns and control flow. In particular, the focus is put on reducing thread divergence by making a judicious choice for the mapping of threads onto the data. Compared to a GPU-accelerated B&B based on a linked-list, the algorithm presented in this paper solves a set of standard flowshop instances on an average 3.3 times faster.

[1]  Imen Chakroun,et al.  Operator-Level GPU-Accelerated Branch and Bound Algorithms , 2013, ICCS.

[2]  Gerard Sierksma,et al.  Branch and peg algorithms for the simple plant location problem , 2003, Computers & Operations Research.

[3]  B. J. Lageweg,et al.  A General Bounding Scheme for the Permutation Flow-Shop Problem , 1978, Oper. Res..

[4]  Teodor Gabriel Crainic,et al.  PARALLEL BRANCH-AND-BOUND ALGORITHMS: SURVEY AND SYNTHESIS , 1993 .

[5]  C. Laisant Sur la numération factorielle, application aux permutations , .

[6]  Imen Chakroun Parallel heterogeneous Branch and Bound algorithms for multi-core and multi-GPU environments. (Algorithmes Branch and Bound parallèles hétérogènes pour environnements multi-coeurs et multi-GPU) , 2013 .

[7]  El-Ghazali Talbi,et al.  A Grid-enabled Branch and Bound Algorithm for Solving Challenging Combinatorial Optimization Problems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[8]  Éric D. Taillard,et al.  Benchmarks for basic scheduling problems , 1993 .

[9]  Didier El Baz,et al.  GPU Implementation of the Branch and Bound Method for Knapsack Problems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[10]  Salah Dowaji,et al.  Parallel and Distributed Branch-and-Bound/A* Algorithms , 1994 .

[11]  Imen Chakroun,et al.  Reducing thread divergence in a GPU‐accelerated branch‐and‐bound algorithm , 2013, Concurr. Comput. Pract. Exp..

[12]  Nouredine Melab,et al.  Work Stealing Strategies For Multi-Core Parallel Branch-and-Bound Algorithm Using Factorial Number System , 2014, PMAM'14.

[13]  Bernard Gendron,et al.  Parallel Branch-and-Branch Algorithms: Survey and Synthesis , 1994, Oper. Res..

[14]  Nouredine Melab,et al.  A Multi-core Parallel Branch-and-Bound Algorithm Using Factorial Number System , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[15]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[16]  Donald E. Knuth The Art of Computer Programming, Volume 1, Fascicle 1: MMIX -- A RISC Computer for the New Millennium (Art of Computer Programming) , 2005 .

[17]  Wu-chun Feng,et al.  Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[18]  Imen Chakroun,et al.  Graphics processing unit‐accelerated bounding for branch‐and‐bound applied to a permutation problem using data access optimization , 2014, Concurr. Comput. Pract. Exp..

[19]  Albert Corominas,et al.  Branch and win: OR tree search algorithms for solving combinatorial optimisation problems , 2004 .

[20]  Weixiong Zhang,et al.  Cut-and-solve: An iterative search strategy for combinatorial optimization problems , 2006, Artif. Intell..

[21]  S. M. Johnson,et al.  Optimal two- and three-stage production schedules with setup times included , 1954 .

[22]  Gustavo Augusto Lima de Campos,et al.  A New Parallel Schema for Branch-and-Bound Algorithms Using GPGPU , 2011, 2011 23rd International Symposium on Computer Architecture and High Performance Computing.

[23]  Sriram Krishnamoorthy,et al.  Lifeline-based global load balancing , 2011, PPoPP '11.

[24]  Ravi Sethi,et al.  The Complexity of Flowshop and Jobshop Scheduling , 1976, Math. Oper. Res..

[25]  Nouredine Melab,et al.  Work Stealing Strategies For Multi-Core Parallel Branch-and-Bound Algorithm Using Factorial Number System , 2014, PMAM.