Harnessing the power of idle GPUs for acceleration of biological sequence alignment

This paper presents a parallel system capable of accelerating biological sequence alignment on the graphics processing unit (GPU) grid. The GPU grid in this paper is a desktop grid system that utilizes idle GPUs and CPUs in the office and home. Our parallel implementation employs a master-worker paradigm to accelerate Liu's OpenGL-based algorithm that runs on a single GPU. We integrate this implementation into a screensaver-based grid system that detects idle resources on which the alignment code can run. We also show some experimental results comparing our implementation with three different implementations running on a single GPU, a single CPU, or multiple CPUs. As a result, we find that a single non-dedicated GPU can provide us almost the same throughput as two dedicated CPUs in our laboratory environment, where GPU-equipped machines are ordinarily used to develop GPU applications.

[1]  Fumihiko Ino,et al.  A Task Parallel Algorithm for Computing the Costs of All-Pairs Shortest Paths on the CUDA-Compatible GPU , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[2]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[3]  Thomas Ertl,et al.  A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality , 2009, IEEE Transactions on Visualization and Computer Graphics.

[4]  Wu-chun Feng,et al.  The Green500 List: Encouraging Sustainable Supercomputing , 2007, Computer.

[5]  Fumihiko Ino,et al.  Design and implementation of the Smith-Waterman algorithm on the CUDA-compatible GPU , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[6]  Rajesh Raman,et al.  Resource management through multilateral matchmaking , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[7]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[8]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9]  Guang R. Gao,et al.  Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform , 2007, HPRCTA.

[10]  Jim X. Chen,et al.  OpenGL Shading Language , 2009 .

[11]  Thomas Ertl,et al.  Large volume visualization of compressed time-dependent datasets on GPU clusters , 2005, Parallel Comput..

[12]  Thomas Ertl,et al.  CUDASA: Compute Unified Device and Systems Architecture , 2008, EGPGV@Eurographics.

[13]  Weiguo Liu,et al.  Streaming Algorithms for Biological Sequence Alignment on GPUs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[14]  Andrew A. Chien,et al.  Entropia: architecture and performance of an enterprise desktop grid system , 2003, J. Parallel Distributed Comput..

[15]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[16]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[17]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[18]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[19]  Anjul Patney,et al.  Efficient computation of sum-products on GPUs through software-managed cache , 2008, ICS '08.

[20]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[21]  Leonel Sousa,et al.  Design and implementation of a stream-based distributedcomputing platform using graphics processing units , 2007, CF '07.

[22]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[23]  Satoshi Matsuoka,et al.  Software-Based ECC for GPUs , 2011 .

[24]  Fumihiko Ino,et al.  A Resource Selection System for Cycle Stealing in GPU Grids , 2008, Journal of Grid Computing.

[25]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[26]  Xiandong Meng,et al.  Exploiting Multi-level Parallelism for Homology Search using General Purpose Processors , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[27]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[28]  Michael S. Farrar Optimizing Smith-Waterman for the Cell Broadband Engine , 2008 .