GPU Computing for Parallel Local Search Metaheuristic Algorithms

Local search metaheuristics (LSMs) are efficient methods for solving complex problems in science and industry. They allow significantly to reduce the size of the search space to be explored and the search time. Nevertheless, the resolution time remains prohibitive when dealing with large problem instances. Therefore, the use of GPU-based massively parallel computing is a major complementary way to speed up the search. However, GPU computing for LSMs is rarely investigated in the literature. In this paper, we introduce a new guideline for the design and implementation of effective LSMs on GPU. Very efficient approaches are proposed for CPU-GPU data transfer optimization, thread control, mapping of neighboring solutions to GPU threads, and memory management. These approaches have been experimented using four well-known combinatorial and continuous optimization problems and four GPU configurations. Compared to a CPU-based execution, accelerations up to \times 80 are reported for the large combinatorial problems and up to \times 240 for a continuous problem. Finally, extensive experiments demonstrate the strong potential of GPU-based LSMs compared to cluster or grid-based parallel architectures.

[1]  Darren M. Chitty,et al.  A data parallel approach to genetic programming using programmable graphics hardware , 2007, GECCO '07.

[2]  Francisco Tirado,et al.  Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting , 2008, IEEE Transactions on Parallel and Distributed Systems.

[3]  Satoshi Matsuoka,et al.  Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[4]  Patrick Siarry,et al.  Tabu Search applied to global optimization , 2000, Eur. J. Oper. Res..

[5]  El-Ghazali Talbi,et al.  ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics , 2004, J. Heuristics.

[6]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[7]  Michel Gendreau,et al.  Parallel asynchronous tabu search for multicommodity location-allocation with balancing requirements , 1996, Ann. Oper. Res..

[8]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[9]  Jean-Yves Potvin,et al.  A parallel implementation of the Tabu search heuristic for vehicle routing problems with time window constraints , 1994, Comput. Oper. Res..

[10]  Jacques Lévy Véhel,et al.  Holder functions and deception of genetic algorithms , 1998, IEEE Trans. Evol. Comput..

[11]  Haoqiang Jin,et al.  Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster , 2003 .

[12]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[13]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[14]  Jadranka Skorin-Kapov,et al.  Massively parallel tabu search for the quadratic assignment problem , 1993, Ann. Oper. Res..

[15]  Zhong-Xian Chi,et al.  An Efficient Fine-grained Parallel Genetic Algorithm Based on GPU-Accelerated , 2007, 2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007).

[16]  Alessandro Bevilacqua,et al.  A Methodological Approach to Parallel Simulated Annealing on an SMP System , 2002, J. Parallel Distributed Comput..

[17]  El-Ghazali Talbi,et al.  Metaheuristics - From Design to Implementation , 2009 .

[18]  El-Ghazali Talbi,et al.  Grid computing for parallel bioinspired algorithms , 2006, J. Parallel Distributed Comput..

[19]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[20]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[21]  Éric D. Taillard,et al.  Robust taboo search for the quadratic assignment problem , 1991, Parallel Comput..

[22]  David Pointcheval,et al.  A New Identification Scheme Based on the Perceptrons Problem , 1995, EUROCRYPT.

[23]  El-Ghazali Talbi,et al.  A Comparative Study of Parallel Metaheuristics for Protein Structure Prediction on the Computational Grid , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[24]  Fred W. Glover,et al.  A cooperative parallel tabu search algorithm for the quadratic assignment problem , 2009, Eur. J. Oper. Res..

[25]  Tien-Tsin Wong,et al.  Parallel Evolutionary Algorithms on Consumer-Level Graphics Processing Unit , 2006, Parallel Evolutionary Computations.

[26]  Wen-mei W. Hwu,et al.  Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..