Large-Scale Bioinformatics Data Mining with Parallel Genetic Programming on Graphics Processing Units

The NCBI GEO GSE3494 breast cancer dataset contains hundreds of Affymetrix HG-U133A and HG-U133B GeneChip biopsies each with a million variables. Multiple genetic programming (GP) runs on a graphics processing unit (GPU) hardware, each with a population of five million programs both winnows (selects) useful variables from the chaff and evolves small (three inputs) data models. The SPMD CUDA interpreter exploits the GPU’s single instruction multiple data (SIMD) mode of parallel computing, even though the GP populations contain different programs. A 448 node nVidia Fermi C2050 Tesla graphics card delivers 8.5 giga GPops per second. In addition to describing our implementation, we survey current GPGPU work in bioinformatics and genetic programming.

[1]  Helmar Burkhart,et al.  Automatic code generation and tuning for stencil kernels on modern shared memory architectures , 2011, Computer Science - Research and Development.

[2]  Wolfgang Banzhaf,et al.  Linear genetic programming GPGPU on Microsoft’s Xbox 360 , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[3]  Chee Keong Kwoh,et al.  CBESW: Sequence Alignment on the Playstation 3 , 2008, BMC Bioinformatics.

[4]  Wolfgang Banzhaf,et al.  Hardware Acceleration for CGP: Graphics Processing Units , 2011, Cartesian Genetic Programming.

[5]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[6]  Marc Ebner,et al.  Towards Automated Learning of Object Detectors , 2010, EvoApplications.

[7]  Marc Ebner,et al.  Evolving Object Detectors with a GPU Accelerated Vision System , 2010, ICES.

[8]  Weiguo Liu,et al.  Bio-sequence database scanning on a GPU , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[9]  P. S. Thiagarajan,et al.  Approximate probabilistic analysis of biopathway dynamics , 2012, Bioinform..

[10]  Markus Brameier,et al.  On linear genetic programming , 2005 .

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Shin Yoo,et al.  Evolving Human Competitive Spectra-Based Fault Localisation Techniques , 2012, SSBSE.

[13]  Raghavendra D. Prabhu,et al.  SOMGPU: An unsupervised pattern classifier on Graphical Processing Unit , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[14]  Marco Aurélio Cavalcanti Pacheco,et al.  Evolving CUDA PTX programs by quantum inspired linear genetic programming , 2011, GECCO.

[15]  W. Langdon,et al.  RNAnet a Map of Human Gene Expression , 2010, 1001.4263.

[16]  Sebastián Ventura,et al.  Solving Classification Problems Using Genetic Programming Algorithms on GPUs , 2010, HAIS.

[17]  Gang Wang,et al.  MrBayes on a Graphics Processing Unit , 2011, Bioinform..

[18]  Wolfgang Banzhaf,et al.  Implementing cartesian genetic programming classifiers on graphics processing units using GPU.NET , 2011, GECCO.

[19]  Dmitri Yudanov,et al.  GPU-based implementation of real-time system for spiking neural networks , 2009 .

[20]  William B. Langdon,et al.  A SIMD Interpreter for Genetic Programming on GPU Graphics Cards , 2007, EuroGP.

[21]  W. B. Langdon,et al.  Genetic Programming and Data Structures , 1998, The Springer International Series in Engineering and Computer Science.

[22]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[23]  Michael O'Neill,et al.  Acceleration of grammatical evolution using graphics processing units: computational intelligence on consumer games and graphics hardware , 2011, GECCO.

[24]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[25]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[26]  William B. Langdon Evolving GeneChip correlation predictors on parallel graphics hardware , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[27]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[28]  Jason Lawrence,et al.  Genetic programming for shader simplification , 2011, ACM Trans. Graph..

[29]  Sarnath Kannan,et al.  Porting Autodock to CUDA , 2010, IEEE Congress on Evolutionary Computation.

[30]  William B. Langdon Debugging CUDA , 2011, GECCO '11.

[31]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[33]  Nicolas Lachiche,et al.  EASEA parallelization of tree-based Genetic Programming , 2010, IEEE Congress on Evolutionary Computation.

[34]  Wolfgang Banzhaf,et al.  Accelerating Genetic Programming through Graphics Processing Units. , 2009 .

[35]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[36]  Cyril Fonlupt,et al.  Genetic programming on graphics processing units , 2009, Genetic Programming and Evolvable Machines.

[37]  W. Feller,et al.  An Introduction to Probability Theory and Its Applications, Vol. 1 , 1967 .

[38]  Cyril Fonlupt,et al.  Population Parallel GP on the G80 GPU , 2008, EuroGP.

[39]  Ting Hu,et al.  Variable population size and evolution acceleration: a case study with a parallel evolutionary algorithm , 2010, Genetic Programming and Evolvable Machines.

[40]  W. B. Langdon,et al.  Spatial Defects in 5896 HG-U 133 A GeneChips , 2007 .

[41]  Asim Munawar,et al.  Hybrid of genetic algorithm and local search to solve MAX-SAT problem using nVidia CUDA framework , 2009, Genetic Programming and Evolvable Machines.

[42]  William B. Langdon,et al.  Genetic Programming for Mining DNA Chip Data from Cancer Patients , 2004, Genetic Programming and Evolvable Machines.

[43]  George D. Magoulas,et al.  TMBL kernels for CUDA GPUs compile faster using PTX: computational intelligence on consumer games and graphics hardware , 2011, GECCO '11.

[44]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[45]  George D. Magoulas,et al.  Identifying similarities in TMBL programs with alignment to quicken their compilation for GPUs: computational intelligence on consumer games and graphics hardware , 2011, GECCO '11.

[46]  William B. Langdon,et al.  GP on SPMD parallel graphics hardware for mega Bioinformatics data mining , 2008, Soft Comput..

[47]  Vidroha Debroy,et al.  Genetic Programming , 1998, Lecture Notes in Computer Science.

[48]  Wolfgang Banzhaf,et al.  Fast Genetic Programming and Artificial Developmental Systems on GPUs , 2007, 21st International Symposium on High Performance Computing Systems and Applications (HPCS'07).

[49]  W. Banzhaf,et al.  1 Linear Genetic Programming , 2007 .

[50]  Wolfgang Banzhaf,et al.  Deployment of CPU and GPU-based genetic programming on heterogeneous devices , 2009, GECCO '09.

[51]  William B. Langdon,et al.  Generalisation in genetic programming , 2011, GECCO.

[52]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[53]  Peter Nordin,et al.  A compiling genetic programming system that directly manipulates the machine-code , 1994 .

[54]  Nicolas Lachiche,et al.  Coarse grain parallelization of evolutionary algorithms on GPGPU cards with EASEA , 2009, GECCO.

[55]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[56]  Nicolas Lachiche,et al.  Fast Evaluation of GP Trees on GPGPU by Optimizing Hardware Scheduling , 2010, EuroGP.

[57]  Jose L. Contreras-Vidal,et al.  Development of a Large-Scale Integrated Neurocognitive Architecture Part 2: Design and Architecture , 2006 .

[58]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[59]  George D. Magoulas,et al.  Strategies to minimise the total run time of cyclic graph based genetic programming with GPUs , 2009, GECCO '09.

[60]  William B. Langdon Initial experiences of the Emerald: e-Infrastructure South GPU supercomputer , 2012 .

[61]  Stéphane Gobron,et al.  Retina simulation using cellular automata and GPU programming , 2007, Machine Vision and Applications.

[62]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[63]  Jörn Loviscach,et al.  Evolutionary Design of BRDFs , 2003, Eurographics.

[64]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[65]  Hugues Juill Parallel Genetic Programming on Fine-Grained SIMD Architectures , 2001 .

[66]  G. Zipf,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. , 1949 .

[67]  Pedro Trancoso,et al.  Initial Experiences Porting a Bioinformatics Application to a Graphics Processor , 2005, Panhellenic Conference on Informatics.

[68]  Hamid R. Arabnia,et al.  A Transputer Network for the Arbitrary Rotation of Digitised Images , 1987, Comput. J..

[69]  Dominik Grewe,et al.  Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation , 2011, GPGPU-4.

[70]  Jacek Izydorczyk,et al.  Microprocessor Scaling: What Limits Will Hold? , 2010, Computer.

[71]  William B. Langdon,et al.  A fast high quality pseudo random number generator for graphics processing units , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[72]  Julian Francis Miller,et al.  Self-modifying cartesian genetic programming , 2007, GECCO '07.

[73]  William B. Langdon,et al.  Creating and Debugging Performance CUDA C , 2012, Parallel Architectures and Bioinspired Algorithms.

[74]  Wolfgang Banzhaf,et al.  Distributed genetic programming on GPUs using CUDA , 2011 .

[75]  Guang-Zhong Yang,et al.  Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline , 2008, Bioinform..

[76]  William B. Langdon,et al.  A Many Threaded CUDA Interpreter for Genetic Programming , 2010, EuroGP.

[77]  José Ignacio Martinez Torre,et al.  Intrinsic evolvable hardware for combinatorial synthesis based on SoC+FPGA and GPU platforms , 2011, GECCO '11.

[78]  Sebastián Ventura,et al.  Speeding up the evaluation phase of GP classification algorithms on GPUs , 2012, Soft Comput..

[79]  William B. Langdon Distilling GeneChips with Genetic Programming on the Emerald GPU supercomputer , 2012 .

[80]  Michael Garland,et al.  Understanding throughput-oriented architectures , 2010, Commun. ACM.

[81]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[82]  Tien-Tsin Wong,et al.  Evolutionary Computing on Consumer Graphics Hardware , 2007, IEEE Intelligent Systems.

[83]  George R. Price,et al.  Selection and Covariance , 1970, Nature.

[84]  L. Holmberg,et al.  Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts , 2005, Breast Cancer Research.

[85]  Wolfgang Banzhaf,et al.  Deployment of parallel linear genetic programming using GPUs on PC and video game console platforms , 2010, Genetic Programming and Evolvable Machines.

[86]  William B. Langdon,et al.  A fast high quality pseudo random number generator for nVidia CUDA , 2009, GECCO '09.

[87]  William B. Langdon,et al.  Graphics processing units and genetic programming: an overview , 2011, Soft Comput..

[88]  Krister Wolff,et al.  Evolving 3D model interpretation of images using graphics hardware , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[89]  Cole Trapnell,et al.  Optimizing data intensive GPGPU computations for DNA sequence alignment , 2009, Parallel Comput..

[90]  Debashis Ghosh,et al.  Feature selection and molecular classification of cancer using genetic programming. , 2007, Neoplasia.

[91]  William B. Langdon,et al.  Genetic Programming in Data Mining for Drug Discovery , 2005 .

[92]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[93]  Michael P. H. Stumpf,et al.  GPU accelerated biochemical network simulation , 2011, Bioinform..

[94]  William B. Langdon Large Scale Bioinformatics Data Mining with Parallel Genetic Programming on Graphics Processing Units , 2010, Parallel and Distributed Computational Intelligence.

[95]  Man Leung Wong,et al.  Parallel multi-objective evolutionary algorithms on graphics processing units , 2009, GECCO '09.

[96]  Martín Pedemonte,et al.  Bitwise operations for GPU implementation of genetic algorithms , 2011, GECCO '11.

[97]  Simon Harding,et al.  Evolution of image filters on graphics processor units using Cartesian Genetic Programming , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[98]  Simon Colton,et al.  Evolving pixel shaders for the prototype video game Subversion , 2010 .

[99]  Mark Harman,et al.  Formal Concept Analysis on Graphics Hardware , 2011, CLA.

[100]  Yuji Sato,et al.  Acceleration experiment of genetic computations for sudoku solution on multi-core processors , 2011, GECCO '11.

[101]  Zhe Fan,et al.  [IEEE ACM/IEEE SC2004 Conference - Pittsburgh, PA, USA (06-12 Nov. 2004)] Proceedings of the ACM/IEEE SC2004 Conference - GPU Cluster for High Performance Computing , 2004 .

[102]  Peter J. Bentley,et al.  Systemic Computation Using Graphics Processors , 2010, ICES.

[103]  Marc Ebner,et al.  Evolution of Vertex and Pixel Shaders , 2005, EuroGP.

[104]  Can Yang,et al.  GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies , 2011, Bioinform..

[105]  James M. Keller,et al.  Speedup of fuzzy logic through stream processing on Graphics Processing Units , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[106]  Suvranu De,et al.  CUDA-based Real Time Surgery Simulation , 2008, MMVR.

[107]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[108]  Tatsuo Unemi,et al.  SBArt4 — Breeding abstract animations in realtime , 2010, IEEE Congress on Evolutionary Computation.

[109]  Zhongwen Luo,et al.  Artificial neural network computation on graphic process unit , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[110]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[111]  El-Ghazali Talbi,et al.  Parallel hybrid evolutionary algorithms on GPU , 2010, IEEE Congress on Evolutionary Computation.

[112]  William B. Langdon,et al.  A Survey of Spatial Defects in Homo Sapiens Affymetrix GeneChips , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[113]  Darren M. Chitty,et al.  A data parallel approach to genetic programming using programmable graphics hardware , 2007, GECCO '07.

[114]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[115]  Martín Pedemonte,et al.  PUGACE, a cellular Evolutionary Algorithm framework on GPUs , 2010, IEEE Congress on Evolutionary Computation.

[116]  Riccardo Poli,et al.  Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications , 2008, Computational Intelligence: A Compendium.

[117]  Marc Ebner,et al.  Engineering of Computer Vision Algorithms Using Evolutionary Algorithms , 2009, ACIVS.

[118]  Mark Harman,et al.  Evolving a CUDA kernel from an nVidia template , 2010, IEEE Congress on Evolutionary Computation.

[119]  Noel Lopes,et al.  High-performance bankruptcy prediction model using Graphics Processing Units , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[120]  Wolfgang Banzhaf,et al.  Fast Genetic Programming on GPUs , 2007, EuroGP.