Sequence Alignment on Massively Parallel Heterogeneous Systems

Bioinformatics is a quickly emerging area of science with many important applications to human life. Sequence alignment in various forms is one of the main instruments used in bioinformatics. This work is motivated by the ever-increasing amount of sequence data that requires more and more computation power for its processing. This task calls for new GPU-based systems and their higher computational potential and energy efficiency as compared to CPUs. We address the problem of facilitating faster sequence alignment using modern multi-GPU clusters. Our initial step was to develop a fast and scalable GPU exact short sequence aligner. We used matching algorithm with small memory footprint based on Burrows-Wheeler transform. We developed a mathematical model of computation and communication costs to find optimal memory partitioning strategy for index and queries. Our solution achieves 10 times speedup over previous implementation based on suffix array on one GPU and scales to multiple GPUs. Our next step will be to adapt the suggested data structure and performance model for multi-node multi-GPU approximate sequence alignment. It is also planned to use exact matching to detect common regions in large sequences and use it as an intermediate step in full-scale genome comparison.

[1]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[2]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[3]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[4]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[5]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[6]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[7]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[8]  Barry V. Hess,et al.  Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , 2010, HiPC 2010.

[9]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[10]  Matei Ripeanu,et al.  Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[12]  Paolo Ferragina,et al.  Indexing compressed text , 2005, JACM.

[13]  Hai Jiang,et al.  An Exact Matching Approach for High Throughput Sequencing Based on BWT and GPUs , 2011, 2011 14th IEEE International Conference on Computational Science and Engineering.

[14]  Bernard P. Puc,et al.  An integrated semiconductor device enabling non-optical genome sequencing , 2011, Nature.