GPU-based Parallel Hybrid Genetic Algorithms

Over the last years, interest in hybrid metaheuristics has risen considerably in the field of optimization. Combinations of algorithms such as genetic algorithms (GAs) and local searc h (LS) methods have provided very powerful search algorithms. However, due to their complexity, the computational time of the solut ion search exploration remains exorbitant when large problem instances are to be solved. Therefore, the use of GPU-based parallel computing is required as a complementary way to speed up the search. This paper pres ents a new methodology to design and implement efficiently and eff ectively hybrid genetic algorithms on GPU accelerators. I. SCHEME OF PARALLELIZATION The adaptation of hybrid GAs on GPU requires to take into account at the same time the characteristics and underlined issues of the GPU architecture and the metaheuristics parallel model s. Since the evaluation of the neighborhood is generally the time-co nsuming part of hybrid GAs, we focus on the re-design of LS algorithms on GPU (see Fig. 1). We propose a three-level decomposition of the GPU adapted to the popular parallel iteration-level m odel [1] (generation and evaluation of the neighborhood in paral lel) allowing a clear separation of the GPU memory hierarchical management concepts (see Fig. 2). In the high-level layer, the CPU sends the number of expected running threads to the GPU, then candidate neighbors are gen erat d and evaluated on GPU (at intermediate-level and low-level) , and finally newly evaluated solutions are returned back to the ho st. This model can be seen as a cooperative model between the CPU and the GPU. Indeed, the GPU is used as a coprocessor in a synchronous manner. The resource-consuming part i.e. t he incremental evaluation kernel is calculated by the GPU and t he rest is handled by the CPU. The intermediate-level layer focuses on the generation of t he LS neighborhood on GPU. This generation is performed in a dynamic manner which implies that no explicit structure nee ds to be allocated or copied (unlike traditional GAs on GPU [2]) . Thereby, only the representation of this candidate solutio n must be copied from the CPU to the GPU. Therefore, the main difficulty of the intermediate-level layer is to find an efficient mapping b etween a GPU thread and a LS neighbor candidate solution. In other wo rds, the issue is to say which solution must be handled by which thr ead. The answer is dependent of the solution representation. Afterwards, GPU memory management of the evaluation function computation is done at low-level. The use of texture mem ory is a solution for reducing memory transactions due to non-coal esced accesses (matrices, solution which generates the neighbor hood). Indeed, texture memory can be seen as a relaxed mechanism for the thread processors to access global memory because the coale scing requirements do not apply to texture memory accesses. II. EXPERIMENTATION