Accelerating the scoring module of mass spectrometry-based peptide identification using GPUs

BackgroundTandem mass spectrometry-based database searching is currently the main method for protein identification in shotgun proteomics. The explosive growth of protein and peptide databases, which is a result of genome translations, enzymatic digestions, and post-translational modifications (PTMs), is making computational efficiency in database searching a serious challenge. Profile analysis shows that most search engines spend 50%-90% of their total time on the scoring module, and that the spectrum dot product (SDP) based scoring module is the most widely used. As a general purpose and high performance parallel hardware, graphics processing units (GPUs) are promising platforms for speeding up database searches in the protein identification process.ResultsWe designed and implemented a parallel SDP-based scoring module on GPUs that exploits the efficient use of GPU registers, constant memory and shared memory. Compared with the CPU-based version, we achieved a 30 to 60 times speedup using a single GPU. We also implemented our algorithm on a GPU cluster and achieved an approximately favorable speedup.ConclusionsOur GPU-based SDP algorithm can significantly improve the speed of the scoring module in mass spectrometry-based protein identification. The algorithm can be easily implemented in many database search engines such as X!Tandem, SEQUEST, and pFind. A software tool implementing this algorithm is available at http://www.comp.hkbu.edu.hk/~youli/ProteinByGPU.html

[1]  B. Eipper,et al.  :Posttranslational Modification of Proteins: Expanding Nature's Inventory , 2008 .

[2]  Wen Gao,et al.  pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry. , 2007, Rapid communications in mass spectrometry : RCM.

[3]  Brian D Halligan,et al.  Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms. , 2009, Journal of proteome research.

[4]  BMC Bioinformatics , 2005 .

[5]  Wen Gao,et al.  Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry , 2004, Bioinform..

[6]  Xiaowen Chu,et al.  G-BLASTN: accelerating nucleotide alignment by graphics processors , 2014, Bioinform..

[7]  Yan Fu,et al.  An efficient parallelization of phosphorylated peptide and protein identification. , 2010, Rapid communications in mass spectrometry : RCM.

[8]  Nathan Edwards,et al.  Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry , 2002, WABI.

[9]  Dominic Battré,et al.  MPI framework for parallel searching in large biological databases , 2006, J. Parallel Distributed Comput..

[10]  K. Resing,et al.  Mapping protein post-translational modifications with mass spectrometry , 2007, Nature Methods.

[11]  Ting Chen,et al.  Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search , 2007, Bioinform..

[12]  Christopher T. Walsh,et al.  Posttranslational Modification of Proteins: Expanding Nature's Inventory , 2005 .

[13]  Jeffrey A Milloy,et al.  Tempest: GPU-CPU computing for high-throughput database spectral matching. , 2012, Journal of proteome research.

[14]  Jimmy K Eng,et al.  Fast parallel tandem mass spectral library searching using GPU hardware acceleration. , 2011, Journal of proteome research.

[15]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[16]  A Bairoch,et al.  High-throughput mass spectrometric discovery of protein post-translational modifications. , 1999, Journal of molecular biology.

[17]  M. Mann,et al.  Proteomic analysis of post-translational modifications , 2003, Nature Biotechnology.

[18]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[19]  Xiaowen Chu,et al.  Practical Random Linear Network Coding on GPUs , 2009, Networking.

[20]  Jiming Liu,et al.  Speeding up K-Means Algorithm by GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[21]  Xiaowen Chu,et al.  Massively Parallel Network Coding on GPUs , 2008, 2008 IEEE International Performance, Computing and Communications Conference.

[22]  You Li,et al.  Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPU , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[23]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[24]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[25]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[26]  Andreas Hildebrandt,et al.  Highly accelerated feature detection in proteomics data sets using modern graphics processing units , 2009, Bioinform..

[27]  F Wold,et al.  Posttranslational covalent modification of proteins. , 1977, Science.

[28]  Andrew J Link,et al.  Parallel tandem: a program for parallel processing of tandem mass spectra using PVM or MPI and X!Tandem. , 2005, Journal of proteome research.

[29]  Sean L Seymour,et al.  Discovering known and unanticipated protein modifications using MS/MS database searching. , 2005, Analytical chemistry.

[30]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[31]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[32]  Daniel Coca,et al.  Hardware acceleration of processing of mass spectrometric data for proteomics , 2007, Bioinform..

[33]  Yan Fu,et al.  Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing. , 2010, Rapid communications in mass spectrometry : RCM.

[34]  Kei-Hoi Cheung,et al.  X!!Tandem, an improved method for running X!tandem in parallel on collections of commodity computers. , 2008, Journal of proteome research.

[35]  Wen Gao,et al.  pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry , 2005, Bioinform..

[36]  A. Shevchenko,et al.  MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. , 2003, Analytical chemistry.