Research of Acceleration MS-Alignment Identifying Post-Translational Modifications on GPU

MS-Alignment is an unrestrictive post-translational modification (PTM) search algorithm with an advantage of searching for all types of PTMs at once in a blind mode. However, it is time-consuming, and thus it could not well meet the challenge of large-scale protein database and spectra. We use Graphic Processor Unit (GPU) to accelerate MS-Alignment for reducing identification time to meet time requirement. The work mainly includes two parts. (1) The step of Database search and Candidate generation (DC) consumes most of the time in MS-Alignment. We propose an algorithm of DC on GPU based on CUDA (DCGPU). The data parallelism way is partitioning protein sequences. We adopt several methods to optimize DCGPU implementation. (2) For further acceleration, we propose an algorithm of MS-Alignment on GPU cluster based on MPI and CUDA (MC_MS-A). The comparison experiments show that the average speedup ratio could be above 26 in the model of at most one modification and above 41 in the model of at most two modifications. The experimental results show that MC_MS-A on GPU Cluster could reduce the time of identifying 31173 spectra from about 2.853 months predicted to 0.606 h. Accelerating MS-Alignment on GPU is applicable for large-scale data requiring for high-speed processing.

[1]  Heejin Park,et al.  Unrestrictive Identification of Multiple Post-translational Modifications from Tandem Mass Spectrometry Using an Error-tolerant Algorithm Based on an Extended Sequence Tag Approach*S , 2008, Molecular & Cellular Proteomics.

[2]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[3]  Dekel Tsur,et al.  Identification of post-translational modifications by blind search of mass spectra , 2005, Nature Biotechnology.

[4]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[5]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[6]  Ari Michael Frank Algorithms for tandem mass spectrometry-based proteomics , 2008 .

[7]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[8]  Wen-mei W. Hwu,et al.  CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.

[9]  Dekel Tsur,et al.  Identification of post-translational modifications via blind search of mass-spectra , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[10]  Da-Fu Ding,et al.  De Novo Interpretation of MS/MS Spectra and Protein Identification via Database Searching. , 2000, Sheng wu hua xue yu sheng wu wu li xue bao Acta biochimica et biophysica Sinica.

[11]  Wu En,et al.  State of the Art and Future Challenge on General Purpose Computation by Graphics Processing Unit , 2004 .

[12]  Witold R. Rudnicki,et al.  An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[13]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.