Generic techniques in general purpose GPU programming with applications to ant colony and image processing algorithms

In 2006 NVIDIA introduced a new unified GPU architecture facilitating general-purpose computation on the GPU. The following year NVIDIA introduced CUDA, a parallel programming architecture for developing general purpose applications for direct execution on the new unified GPU. CUDA exposes the GPU's massively parallel architecture of the GPU so that parallel code can be written to execute much faster than its sequential counterpart. Although CUDA abstracts the underlying architecture, fully utilising and scheduling the GPU is non-trivial and has given rise to a new active area of research. Due to the inherent complexities pertaining to GPU development, in this thesis we explore and find efficient parallel mappings of existing and new parallel algorithms on the GPU using NVIDIA CUDA. We place particular emphasis on metaheuristics, image processing and designing reusable techniques and mappings that can be applied to other problems and domains. We begin by focusing on Ant Colony Optimisation (ACO), a nature inspired heuristic approach for solving optimisation problems. We present a versatile improved data-parallel approach for solving the Travelling Salesman Problem using ACO resulting in significant speedups. By extending our initial work, we show how existing mappings of ACO on the GPU are unable to compete against their sequential counterpart when common CPU optimisation strategies are employed and detail three distinct candidate set parallelisation strategies for execution on the GPU. By further extending our data-parallel approach we present the first implementation of an ACO-based edge detection algorithm on the GPU to reduce the execution time and improve the viability of ACO-based edge detection. We finish by presenting a new color edge detection technique using the volume of a pixel in the HSI color space along with a parallel GPU implementation that is able to withstand greater levels of noise than existing algorithms.

[1]  David Patterson,et al.  The Top 10 Innovations in the New NVIDIA Fermi Architecture, and the Top 3 Next Challenges , 2009 .

[2]  Wen-mei W. Hwu,et al.  GPU Computing Gems Jade Edition , 2011 .

[3]  Guohua Zhou,et al.  A parallel Ant Colony Optimization algorithm with GPU-acceleration based on All-In-Roulette selection , 2010, Third International Workshop on Advanced Computational Intelligence.

[4]  Zicheng Guo,et al.  Parallel thinning with two-subiteration algorithms , 1989, Commun. ACM.

[5]  Oscar Castillo,et al.  An improved method for edge detection based on interval type-2 fuzzy logic , 2010, Expert Syst. Appl..

[6]  Jirí Jaros,et al.  Parallel Genetic Algorithm on the CUDA Architecture , 2010, EvoApplications.

[7]  Martin Burtscher,et al.  A Parallel GPU Version of the Traveling Salesman Problem , 2011 .

[8]  Reiner Lenz,et al.  On Color Edge Detection , 2000, PICS.

[9]  Koji Nakano,et al.  An Efficient GPU Implementation of Ant Colony Optimization for the Traveling Salesman Problem , 2012, 2012 Third International Conference on Networking and Computing.

[10]  Iain A. Stewart,et al.  Improving Ant Colony Optimization performance on the GPU using CUDA , 2013, 2013 IEEE Congress on Evolutionary Computation.

[11]  Akira Asano,et al.  Hybrid Image Thresholding Method using Edge Detection , 2009 .

[12]  Vittorio Maniezzo,et al.  The Ant System Applied to the Quadratic Assignment Problem , 1999, IEEE Trans. Knowl. Data Eng..

[13]  Gilles Pagès,et al.  GPGPUs in computational finance: massive parallel computing for American style options , 2011, Concurr. Comput. Pract. Exp..

[14]  Javier Jaén Martínez,et al.  Strategies for accelerating ant colony optimization algorithms on graphical processing units , 2007, 2007 IEEE Congress on Evolutionary Computation.

[15]  Marco Dorigo,et al.  Optimization, Learning and Natural Algorithms , 1992 .

[16]  Alfonsas Misevicius,et al.  Generating High Quality Candidate Sets by Tour Merging for the Traveling Salesman Problem , 2012, ICIST.

[17]  Sameera Sadaf,et al.  Image edge detection using ant colony optimization , 2017 .

[18]  Ching Y. Suen,et al.  Thinning Methodologies - A Comprehensive Survey , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Ching Y. Suen,et al.  A fast parallel algorithm for thinning digital patterns , 1984, CACM.

[20]  Marcus Randall,et al.  Candidate Set Strategies for Ant Colony Optimisation , 2002, Ant Algorithms.

[21]  Jiankang Dong,et al.  Implementation of Ant Colony Algorithm Based on GPU , 2009, CGIV.

[22]  Iain A. Stewart,et al.  Candidate Set Parallelization Strategies for Ant Colony Optimization on the GPU , 2013, ICA3PP.

[23]  Thomas Stützle,et al.  MAX-MIN Ant System , 2000, Future Gener. Comput. Syst..

[24]  Wei Liu,et al.  A Novel Simple Candidate Set Method for Symmetric TSP and Its Application in MAX-MIN Ant System , 2012, ICSI.

[25]  Thomas Stützle,et al.  Parallel Ant Colony Optimization for the Traveling Salesman Problem , 2006, ANTS Workshop.

[26]  Marc Gravel,et al.  Parallel Ant Colony Optimization on Graphics Processing Units , 2013, J. Parallel Distributed Comput..

[27]  Ramani Duraiswami,et al.  Canny edge detection on NVIDIA CUDA , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[28]  H. James Hoover,et al.  Limits to parallel computation , 1995 .

[29]  Thomas Stützle,et al.  Parallelization Strategies for Ant Colony Optimization , 1998, PPSN.

[30]  Z.A. Othman,et al.  Reducing iteration using candidate list , 2008, 2008 International Symposium on Information Technology.

[31]  J. Fung Computer Vision on the GPU , 2005 .

[32]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[33]  Djemel Ziou,et al.  Edge Detection Techniques-An Overview , 1998 .

[34]  Driss Aboutajdine,et al.  CVVEFM: Cubical voxels and virtual electric field model for edge detection in color images , 2008, Signal Process..

[35]  G. Reinelt The traveling salesman: computational solutions for TSP applications , 1994 .

[36]  Robert M. Farber,et al.  CUDA Application Design and Development , 2011 .

[37]  Chien-Chang Chen,et al.  Edge detection improvement by ant colony optimization , 2008, Pattern Recognit. Lett..

[38]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[39]  Albert Cohen,et al.  Edge detection insensitive to changes of illumination in the image , 2010, Image Vis. Comput..

[40]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jiliu Zhou,et al.  An Ant Colony Optimization Algorithm for Image Edge Detection , 2010, 2010 International Conference on Artificial Intelligence and Computational Intelligence.

[42]  Iain A. Stewart,et al.  COLOR IMAGE EDGE DETECTION BASED ON QUANTITY OF COLOR INFORMATION AND ITS IMPLEMENTATION ON THE GPU , 2011 .

[43]  Victor Podlozhnyuk,et al.  Image Convolution with CUDA , 2007 .

[44]  Martyn Amos,et al.  Enhancing data parallelism for Ant Colony Optimization on GPUs , 2013, J. Parallel Distributed Comput..

[45]  Ximing Li,et al.  MAX-MIN Ant System on GPU with CUDA , 2009, 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC).

[46]  Hossein Nezamabadi-pour,et al.  Edge detection using ant algorithms , 2006, Soft Comput..

[47]  Kamil Rocki,et al.  An efficient GPU implementation of a multi-start TSP solver for large problem instances , 2012, GECCO '12.

[48]  J. Hornegger,et al.  Fast GPU-Based CT Reconstruction using the Common Unified Device Architecture (CUDA) , 2007, 2007 IEEE Nuclear Science Symposium Conference Record.

[49]  Panos E. Trahanias,et al.  Vector order statistics operators as color edge detectors , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[50]  Weihang Zhu,et al.  Parallel ant colony for nonlinear function optimization with graphics hardware acceleration , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[51]  Satoshi Goto,et al.  An MRF model-based approach to the detection of rectangular shape objects in color images , 2007, Signal Process..

[52]  Wen-mei W. Hwu,et al.  GPU Computing Gems Emerald Edition , 2011 .

[53]  P. Pospichal GPU-based Acceleration of the Genetic Algorithm , 2009 .

[54]  Humberto Bustince,et al.  A gravitational approach to edge detection based on triangular norms , 2010, Pattern Recognit..

[55]  Iain A. Stewart,et al.  Accelerating ant colony optimization-based edge detection on the GPU using CUDA , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[56]  Koji Nakano,et al.  Efficient Canny Edge Detection Using a GPU , 2010, 2010 First International Conference on Networking and Computing.