Creating and Debugging Performance CUDA C

Various practical ways of testing, locating and removing bugs in parallel general-purpose computation on graphics hardware GPGPU applications are described. Some of these are generic whilst other relate directly to stochastic bioinspired techniques, such as genetic programming. We pass on software engineering lessons learnt during CUDA C programming and ways to obtain high performance from nVidia GPU and Tesla cards including examples of both successful and less successful recent applications.

[1]  A. Campbell,et al.  Progress in Artificial Intelligence , 1995, Lecture Notes in Computer Science.

[2]  Mark Harman,et al.  Formal Concept Analysis on Graphics Hardware , 2011, CLA.

[3]  Wolfgang Banzhaf,et al.  Evolving Reaction-Diffusion Systems on GPU , 2011, EPIA.

[4]  Fabio Daolio,et al.  GPU-Based Road Sign Detection Using Particle Swarm Optimization , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[5]  Hamid R. Arabnia,et al.  A Transputer Network for the Arbitrary Rotation of Digitised Images , 1987, Comput. J..

[6]  James M. Keller,et al.  Speedup of Fuzzy Clustering Through Stream Processing on Graphics Processing Units , 2008, IEEE Transactions on Fuzzy Systems.

[7]  Marc Ebner,et al.  Evolution of Vertex and Pixel Shaders , 2005, EuroGP.

[8]  Wolfgang Banzhaf,et al.  Distributed genetic programming on GPUs using CUDA , 2011 .

[9]  William B. Langdon,et al.  A Many Threaded CUDA Interpreter for Genetic Programming , 2010, EuroGP.

[10]  Nicholas A. Sinnott-Armstrong,et al.  High Performance Parallel Disease Detection : an Artificial Immune System for Graphics Processing Units , 2010 .

[11]  Michael Garland,et al.  Understanding throughput-oriented architectures , 2010, Commun. ACM.

[12]  Soha Hassoun,et al.  Evolving soft robotic locomotion in PhysX , 2009, GECCO '09.

[13]  Cesare Alippi,et al.  Genetic-algorithm programming environments , 1994, Computer.

[14]  Jason H. Moore,et al.  Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS , 2010, Bioinform..

[15]  David E. Goldberg,et al.  Efficient Parallel Genetic Algorithms: Theory and Practice , 2000 .

[16]  Andreas Moshovos,et al.  Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[17]  Mark Harman,et al.  Evolving a CUDA kernel from an nVidia template , 2010, IEEE Congress on Evolutionary Computation.

[18]  Mark Harman,et al.  Multi Objective Higher Order Mutation Testing with Genetic Programming , 2009 .

[19]  Wolfgang Banzhaf,et al.  Fast Genetic Programming on GPUs , 2007, EuroGP.

[20]  Joachim Stender,et al.  Parallel Genetic Algorithms: Theory and Applications , 1993 .

[21]  James M. Keller,et al.  Speedup of fuzzy logic through stream processing on Graphics Processing Units , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[22]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[23]  William B. Langdon,et al.  GP on SPMD parallel graphics hardware for mega Bioinformatics data mining , 2008, Soft Comput..

[24]  Bjarne Stroustrup,et al.  The C++ programming language (2nd ed.) , 1991 .

[25]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[26]  Tien-Tsin Wong,et al.  Evolutionary Computing on Consumer Graphics Hardware , 2007, IEEE Intelligent Systems.

[27]  William B. Langdon,et al.  A fast high quality pseudo random number generator for nVidia CUDA , 2009, GECCO '09.

[28]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[29]  Colin R. Reeves,et al.  Genetic Algorithms: Principles and Perspectives: A Guide to Ga Theory , 2002 .

[30]  Raghavendra D. Prabhu,et al.  SOMGPU: An unsupervised pattern classifier on Graphical Processing Unit , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[31]  William B. Langdon The Cg Tutorial, Fernando and Kilgard, Addison-Wesley nVidia ISBN 0-321-19496-9 , 2007 .

[32]  Leon Reznik,et al.  GPU-based simulation of spiking neural networks with real-time performance & high accuracy , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[33]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[34]  William B. Langdon,et al.  A SIMD Interpreter for Genetic Programming on GPU Graphics Cards , 2007, EuroGP.

[35]  Weihang Zhu,et al.  Parallel ant colony for nonlinear function optimization with graphics hardware acceleration , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[36]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[37]  Dmitri Yudanov,et al.  GPU-based implementation of real-time system for spiking neural networks , 2009 .

[38]  William B. Langdon Evolving GeneChip correlation predictors on parallel graphics hardware , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).