Parallel ant colony optimization on multi-core SIMD CPUs

Abstract Ant colony optimization (ACO) is a population-based metaheuristic for solving hard combinatorial optimization problems. Many studies are dedicated to accelerating ACO by parallel hardware, especially by graphics processing units (GPUs). However, due to the irregular (random) pattern of ACO algorithms in data access and control flow, the performance of GPU-based approaches is constrained by hardware limitations. CPU-based SIMD computing for ACOs is rarely investigated in previous literatures, and how well multicore-SIMD CPU-based parallel ACOs could perform remains unknown. In this paper, we present and evaluate a model of vector parallel ACO for multi-core SIMD CPU architecture. In the proposed model, each ant is mapped with a CPU core and the tour construction of each ant is accelerated by vector instructions. Furthermore, based on the model, we propose a new fitness proportionate selection approach named Vector-based Roulette Wheel (VRW) in the tour construction stage. In this approach, the fitness values are grouped into SIMD lanes and the prefix sum is computed in vector-parallel mode. The proposed algorithm is tested on standard TSP instances ranging from 198 to 4461 cities and shows a speedup factor of 57.8x compared to the single-threaded CPU counterpart. More significantly, we compare our approach with high performance GPU-based ACOs, and the results demonstrate the strong potential of CPU-based parallel ACOs.

[1]  Eugene L. Lawler,et al.  Traveling Salesman Problem , 2016 .

[2]  Thomas Stützle,et al.  Ant Colony Optimization , 2009, EMO.

[3]  Shigeyoshi Tsutsui,et al.  Parallel Ant Colony Optimization Algorithm on a Multi-core Processor , 2010, ANTS Conference.

[4]  P. Sadayappan,et al.  StVEC: A Vector Instruction Extension for High Performance Stencil Computation , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[5]  Fazhi He,et al.  A correlative classifiers approach based on particle filter and sample set for tracking occluded target , 2017 .

[6]  Pradeep Dubey,et al.  Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..

[7]  Yiqi Wu,et al.  A local start search algorithm to compute exact Hausdorff Distance for arbitrary point sets , 2017, Pattern Recognit..

[8]  Hao Wang,et al.  Parallel Ant System Based on OpenMP , 2013 .

[9]  Hao Wang,et al.  An Improved Ant System Algorithm Based on PPL , 2010, 2010 2nd International Conference on Information Engineering and Computer Science.

[10]  Pedro Trancoso,et al.  Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function , 2009, 2009 International Conference on Parallel Processing.

[11]  Christian Blum,et al.  Metaheuristics in combinatorial optimization: Overview and conceptual comparison , 2003, CSUR.

[12]  Guohua Zhou,et al.  A parallel Ant Colony Optimization algorithm with GPU-acceleration based on All-In-Roulette selection , 2010, Third International Workshop on Advanced Computational Intelligence.

[13]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[14]  Jie Shen,et al.  An application-centric evaluation of OpenCL on multi-core CPUs , 2013, Parallel Comput..

[15]  Martyn Amos,et al.  Enhancing GPU parallelism in nature-inspired algorithms , 2012, The Journal of Supercomputing.

[16]  Yi Zhou,et al.  Optimization of parallel iterated local search algorithms on graphics processing unit , 2016, The Journal of Supercomputing.

[17]  Fazhi He,et al.  Quantitative optimization of interoperability during feature-based data exchange , 2015, Integr. Comput. Aided Eng..

[18]  Martyn Amos,et al.  Enhancing data parallelism for Ant Colony Optimization on GPUs , 2013, J. Parallel Distributed Comput..

[19]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[20]  Wang Jiening,et al.  Implementation of Ant Colony Algorithm Based on GPU , 2009, 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization.

[21]  Martín Pedemonte,et al.  A survey on parallel ant colony optimization , 2011, Appl. Soft Comput..

[22]  Qian Kun-mina A parallel ant colony optimization algorithm based on fine-grained model with GPU-accelerated , 2009 .

[23]  Yuan Cheng,et al.  A string-wise CRDT algorithm for smart and large-scale collaborative editing systems , 2017, Adv. Eng. Informatics.

[24]  Fazhi He,et al.  Using shapes correlation for active contour segmentation of uterine fibroid ultrasound images in computer-aided therapy , 2016 .

[25]  Fazhi He,et al.  An Efficient Particle Swarm Optimization for Large-Scale Hardware/Software Co-Design System , 2017, Int. J. Cooperative Inf. Syst..

[26]  Martyn Amos,et al.  Dynamic load balancing on heterogeneous clusters for parallel ant colony optimization , 2016, Cluster Computing.

[27]  Thomas Stützle,et al.  Stochastic Local Search: Foundations & Applications , 2004 .

[28]  Soonhung Han,et al.  An efficient approach to directly compute the exact Hausdorff distance for 3D point sets , 2017, Integr. Comput. Aided Eng..

[29]  Gabriele Kotsis,et al.  Parallelization strategies for the ant system , 1998 .

[30]  Weihang Zhu,et al.  Parallel ant colony for nonlinear function optimization with graphics hardware acceleration , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[31]  Fazhi He,et al.  Service-Oriented Feature-Based Data Exchange for Cloud-Based Design and Manufacturing , 2018, IEEE Transactions on Services Computing.

[32]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[33]  Yi Zhou,et al.  Dynamic strategy based parallel ant colony optimization on GPUs for TSPs , 2017, Science China Information Sciences.

[34]  Yuan Cheng,et al.  Meta-operation conflict resolution for human–human interaction in collaborative feature-based CAD systems , 2016, Cluster Computing.

[35]  Jack J. Dongarra,et al.  Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor , 2009, Parallel Comput..

[36]  Michaël Krajecki,et al.  Parallel GPU Implementation of Iterated Local Search for the Travelling Salesman Problem , 2012, LION.

[37]  Thomas Stützle,et al.  Parallelization Strategies for Ant Colony Optimization , 1998, PPSN.

[38]  Jun Sun,et al.  A multiple template approach for robust tracking of fast motion target , 2016, Applied Mathematics-A Journal of Chinese Universities.

[39]  Thomas Stützle,et al.  MAX-MIN Ant System , 2000, Future Gener. Comput. Syst..

[40]  Marc Gravel,et al.  Parallel Ant Colony Optimization on Graphics Processing Units , 2013, J. Parallel Distributed Comput..

[41]  Marco Dorigo,et al.  Ant algorithms and stigmergy , 2000, Future Gener. Comput. Syst..

[42]  Gerhard Reinelt,et al.  TSPLIB - A Traveling Salesman Problem Library , 1991, INFORMS J. Comput..

[43]  Rafal Skinderowicz,et al.  The GPU-based parallel Ant Colony System , 2016, J. Parallel Distributed Comput..

[44]  Soonhung Han,et al.  Collaborative CAD Synchronization Based on a Symmetric and Consistent Modeling Procedure , 2017, Symmetry.

[45]  Xiao Chen,et al.  Real-time object tracking via compressive feature selection , 2016, Frontiers of Computer Science.

[46]  Ozgur Koray Sahingoz,et al.  ACO algorithms with multi-core implementation , 2013, 2013 7th International Conference on Application of Information and Communication Technologies.

[47]  Zheng Chen,et al.  Realization of Parallel Ant Colony Algorithm Based on TBB Multi-core Platform , 2010, 2010 International Forum on Information Technology and Applications.

[48]  Xiao Chen,et al.  A parallel and robust object tracking approach synthesizing adaptive Bayesian learning and improved incremental subspace learning , 2019, Frontiers of Computer Science.

[49]  ThanhVu Nguyen,et al.  Parallel shared memory strategies for ant-based optimization algorithms , 2009, GECCO '09.

[50]  Javier Jaén Martínez,et al.  Strategies for accelerating ant colony optimization algorithms on graphical processing units , 2007, 2007 IEEE Congress on Evolutionary Computation.

[51]  El-Ghazali Talbi,et al.  GPU Computing for Parallel Local Search Metaheuristic Algorithms , 2013, IEEE Transactions on Computers.

[52]  Gong Guanghong,et al.  Application of multi-core parallel ant colony optimization in target assignment problem , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[53]  Fazhi He,et al.  A Novel Hardware/Software Partitioning Method Based on Position Disturbed Particle Swarm Optimization with Invasive Weed Optimization , 2017, Journal of Computer Science and Technology.

[54]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[55]  Ximing Li,et al.  MAX-MIN Ant System on GPU with CUDA , 2009, 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC).

[56]  Koji Nakano,et al.  An Efficient GPU Implementation of Ant Colony Optimization for the Traveling Salesman Problem , 2012, 2012 Third International Conference on Networking and Computing.

[57]  Iain A. Stewart,et al.  Improving Ant Colony Optimization performance on the GPU using CUDA , 2013, 2013 IEEE Congress on Evolutionary Computation.

[58]  Kang Li,et al.  Robust Visual Tracking Based on Convolutional Features with Illumination and Occlusion Handing , 2018, Journal of Computer Science and Technology.