Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA

Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT (“GPU-CASSERT”) parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm.

[1]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[2]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[3]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[4]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[5]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[8]  Arthur M. Lesk,et al.  Introduction to Protein Science: Architecture, Function, and Genomics , 2001 .

[9]  Yuan-Fang Wang,et al.  CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[10]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[11]  Douglas L. Brutlag,et al.  FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web , 2004, Nucleic Acids Res..

[12]  Zhiping Weng,et al.  FAST: A novel protein structure alignment algorithm , 2004, Proteins.

[13]  Jean-François Gibrat,et al.  Towards an automatic classification of protein structural domains based on structural similarity , 2008, BMC Bioinformatics.

[14]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[15]  Forbes J. Burkowski Structural Bioinformatics - An Algorithmic Approach , 2008, Chapman and Hall / CRC mathematical and computational biology series.

[16]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[17]  Liisa Holm,et al.  Searching protein structure databases with DaliLite v.3 , 2008, Bioinform..

[18]  Roberto Mosca,et al.  Alignment of protein structures in the presence of domain motions , 2008, BMC Bioinformatics.

[19]  Ali Akoglu,et al.  Sequence alignment with GPU: Performance and design challenges , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[20]  John E. Stone,et al.  Long time-scale simulations of in vivo diffusion using GPU hardware , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[21]  Marc A. Suchard,et al.  Many-core algorithms for statistical phylogenetics , 2009, Bioinform..

[22]  Vijay S. Pande,et al.  Accelerating molecular dynamic simulation on graphics processing units , 2009, J. Comput. Chem..

[23]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[24]  Alina Momot,et al.  Improving Performance of Protein Structure Similarity Searching by Distributing Computations in Hierarchical Multi-Agent System , 2010, ICCCI.

[25]  Fan Meng,et al.  The gputools package enables GPU computing in R , 2010, Bioinform..

[26]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[27]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[28]  Peter J. Stuckey,et al.  Fast and accurate protein substructure searching with simulated annealing and GPUs , 2010, BMC Bioinformatics.

[29]  Dariusz Mrozek,et al.  An Improved Method for Protein Similarity Searching by Alignment of Fuzzy Energy Signatures , 2011, Int. J. Comput. Intell. Syst..

[30]  Dariusz Mrozek,et al.  Fast and Accurate Similarity Searching of Biopolymer Sequences with GPU and CUDA , 2011, ICA3PP.

[31]  Chi-Ren Shyu,et al.  Accelerating large-scale protein structure alignments with graphics processing units , 2012, BMC Research Notes.

[32]  Alina Momot,et al.  Scalable System for Protein Structure Similarity Searching , 2011, ICCCI.

[33]  Bogdan Lesyng,et al.  A novel method to compare protein structures using local descriptors , 2011, BMC Bioinformatics.

[34]  Andrzej Kolinski,et al.  ClusCo: clustering and comparison of protein models , 2013, BMC Bioinformatics.

[35]  Hao Chen,et al.  Effective inter-residue contact definitions for accurate protein fold recognition , 2012, BMC Bioinformatics.

[36]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[37]  Shintaro Minami,et al.  MICAN : a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, Cα only models, Alternative alignments, and Non-sequential alignments , 2012, BMC Bioinformatics.

[38]  Dariusz Mrozek,et al.  CASSERT: A Two-Phase Alignment Algorithm for Matching 3D Structures of Proteins , 2013, CN.

[39]  Dariusz Mrozek,et al.  MViewer: Visualization of Protein Molecular Structures Stored in the PDB, mmCIF and PDBML Data Formats , 2013, CN.