High-performance hardware implementation of a parallel database search engine for real-time peptide mass fingerprinting

MOTIVATION Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a 'fingerprint' that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution.

[1]  Barry S. Fagin,et al.  A special-purpose processor for gene sequence analysis , 1993, Comput. Appl. Biosci..

[2]  Michel Dumontier,et al.  Hardware-accelerated protein identification for mass spectrometry. , 2005, Rapid communications in mass spectrometry : RCM.

[3]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[4]  Søren Brunak,et al.  SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation , 2007, ISMB/ECCB.

[5]  A. Berger,et al.  On the size of the active site in proteases. I. Papain. , 1967, Biochemical and biophysical research communications.

[6]  Steven A. Guccione,et al.  Gene Matching Using JBits , 2002, FPL.

[7]  Vittorio Rosato,et al.  Designing hardware for protein sequence analysis , 2003, Bioinform..

[8]  Bertil Schmidt,et al.  Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW , 2005, Bioinform..

[9]  Jeremy Buhler,et al.  Designing patterns for profile HMM search , 2007, Bioinform..

[10]  Fredrik Levander,et al.  Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting , 2004, Bioinform..

[11]  Daniel Coca,et al.  Hardware acceleration of processing of mass spectrometric data for proteomics , 2007, Bioinform..

[12]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[13]  Richard Hughey,et al.  Parallel hardware for sequence comparison and alignment , 1996, Comput. Appl. Biosci..

[14]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[15]  Dominique Lavenier,et al.  SAMBA: hardware accelerator for biological sequence comparison , 1997, Comput. Appl. Biosci..

[16]  Amos Bairoch,et al.  FindPept, a tool to identify unmatched masses in peptide mass fingerprinting protein identification , 2002, Proteomics.

[17]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..

[18]  Reinhard Männer,et al.  Real‐time primer design for DNA chips , 2004, Concurr. Pract. Exp..

[19]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[20]  Dominique Lavenier,et al.  Speeding up genome computation with a systolic accelerator , 2001 .