Efficient local search on the GPU - Investigations on the vehicle routing problem

We study how to implement local search efficiently on data parallel accelerators such as Graphics Processing Units. The Distance-constrained Capacitated Vehicle Routing Problem, a computationally very hard discrete optimization problem with high industrial relevance, is the selected vehicle for our investigations. More precisely, we investigate local search with the Best Improving strategy for the 2-opt and 3-opt operators on a giant tour representation. Resource extension functions are used for constant time move evaluation. Using CUDA, a basic implementation called The Benchmark Version has been developed and deployed on a Fermi architecture Graphics Processing Unit. Both neighborhood setup and evaluation are performed entirely on the device. The Benchmark Version is the initial step of an incremental improvement process where a number of important implementation aspects have been systematically studied. Ten well-known test instances from the literature are used in computational experiments, and profiling tools are used to identify bottlenecks. In the final version, the device is fully saturated, given a large enough problem instance. A speedup of almost an order of magnitude relative to The Benchmark Version is observed. We conclude that, with some effort, local search may be implemented very efficiently on Graphics Processing Units. Our experiments show that a maximum efficiency, however, requires a neighborhood cardinality of at least one million. Full exploration of a billion neighbors takes a few seconds and may be deemed too expensive with the current technology. Reduced neighborhoods through filtering is an obvious remedy. Experiments on simple models of neighborhood filtering indicate, however, that the speedup effect is limited on data parallel accelerators. We believe these insights are valuable in the design of new metaheuristics that fully utilize modern, heterogeneous processors.

[1]  Adam Janiak,et al.  Tabu Search on GPU , 2008, J. Univers. Comput. Sci..

[2]  Álvaro García-Sánchez,et al.  Parallel CUDA Architecture for Solving de VRP with ACO , 2012 .

[3]  Yves Crama,et al.  Local Search in Combinatorial Optimization , 2018, Artificial Neural Networks.

[4]  El-Ghazali Talbi,et al.  GPU-based island model for evolutionary algorithms , 2010, GECCO '10.

[5]  Geir Hasle,et al.  Industrial Vehicle Routing , 2007, Geometric Modelling, Numerical Simulation, and Optimization.

[6]  Michel Gendreau,et al.  An efficient variable neighborhood search heuristic for very large scale vehicle routing problems , 2007, Comput. Oper. Res..

[7]  André R. Brodtkorb,et al.  Scientific Computing on Heterogeneous Architectures , 2011 .

[8]  Nathan Bell,et al.  Thrust: A Productivity-Oriented Library for CUDA , 2012 .

[9]  Stefan Irnich Resource extension functions: properties, inversion, and generalization to segments , 2008, OR Spectr..

[10]  Yifang Liu Algorithms for VLSI Circuit Optimization and GPU-Based Parallelization , 2010 .

[11]  George B. Dantzig,et al.  The Truck Dispatching Problem , 1959 .

[12]  Robert E. Bixby,et al.  Solving Real-World Linear Programs: A Decade and More of Progress , 2002, Oper. Res..

[13]  Christopher Dyken,et al.  State-of-the-art in heterogeneous computing , 2010, Sci. Program..

[14]  Paolo Toth,et al.  Models, relaxations and exact approaches for the capacitated vehicle routing problem , 2002, Discret. Appl. Math..

[15]  Paolo Toth,et al.  The Vehicle Routing Problem , 2002, SIAM monographs on discrete mathematics and applications.

[16]  J. K. Lenstra,et al.  Local Search in Combinatorial Optimisation. , 1997 .

[17]  Olli Bräysy,et al.  Active-guided evolution strategies for large-scale capacitated vehicle routing problems , 2007, Comput. Oper. Res..

[18]  El-Ghazali Talbi,et al.  Metaheuristics - From Design to Implementation , 2009 .

[19]  David Pisinger,et al.  A general heuristic for vehicle routing problems , 2007, Comput. Oper. Res..

[20]  Stefan Irnich,et al.  A Unified Modeling and Solution Framework for Vehicle Routing and Local Search-Based Metaheuristics , 2008, INFORMS J. Comput..

[21]  Enrique Alba,et al.  Cellular Genetic Algorithm on Graphic Processing Units , 2010, NICSO.

[22]  Martyn Amos,et al.  Parallelization strategies for ant colony optimisation on GPUs , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[23]  El-Ghazali Talbi,et al.  Neighborhood Structures for GPU-Based Local Search Algorithms , 2010, Parallel Process. Lett..

[24]  Nicos Christofides,et al.  An exact algorithm for the vehicle routing problem based on the set partitioning formulation with additional cuts , 2008, Math. Program..

[25]  Nicolas Jozefowiez,et al.  The vehicle routing problem: Latest advances and new challenges , 2007 .

[26]  Michel Gendreau,et al.  Handbook of Metaheuristics , 2010 .