pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry.

This paper describes the pFind 2.0 software package for peptide and protein identification via tandem mass spectrometry. Firstly, the most important feature of pFind 2.0 is that it offers a modularized and customized platform for third parties to test and compare their algorithms. The developers can create their own modules following the open application programming interface (API) standards and then add it into workflows in place of the default modules. In addition, to accommodate different requirements, the package provides four automated workflows adopting different algorithm modules, executing processes and result reports. Based on this design, pFind 2.0 provides an automated target-decoy database search strategy: The user can just specify a certain false positive rate (FPR) and start searching. Then the system will return the protein identification results automatically filtered by such an estimated FPR. Secondly, pFind 2.0 is also of high accuracy and high speed. Many pragmatic preprocessing, peptide-scoring, validation, and protein inference algorithms have been incorporated. To speed up the searching process, a toolbox for indexing protein databases is developed for high-throughput applications and all modules are implemented under a new architecture designed for large-scale parallel and distributed searching. An experiment on a public dataset shows that pFind 2.0 can identify more peptides than SEQUEST and Mascot at the 1% FPR. It is also demonstrated that this version of pFind 2.0 has better usability and higher speed than its previous versions. The software and more detailed supplementary information can both be accessed at http://pfind.ict.ac.cn/.

[1]  Mikhail S. Gelfand,et al.  Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors , 2001, Bioinform..

[2]  J. Bunkenborg,et al.  Database‐independent, database‐dependent, and extended interpretation of peptide mass spectra in VEMS V2.0 , 2004, Proteomics.

[3]  J. Yates,et al.  A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. , 2003, Analytical chemistry.

[4]  大房 健 基礎講座 電気泳動(Electrophoresis) , 2005 .

[5]  Ting Chen,et al.  PepHMM: a hidden Markov model based scoring function for mass spectrometry database search. , 2006, Analytical chemistry.

[6]  Wen Gao,et al.  Predicting molecular formulas of fragment ions with isotope patterns in tandem mass spectra , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[8]  Wen Gao,et al.  An SVM Scorer for More Sensitive and Reliable Peptide Identification via Tandem Mass Spectrometry , 2006, Pacific Symposium on Biocomputing.

[9]  B. Chait,et al.  Protein indentification using mass spectrometric information , 1998, Electrophoresis.

[10]  Lauren Wood 技術解説 IEEE Internet Computing , 1999 .

[11]  Wen Gao,et al.  pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry , 2005, Bioinform..

[12]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[13]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[14]  Michi Henning,et al.  A new approach to object-oriented middleware , 2004, IEEE Internet Computing.

[15]  Pavel A. Pevzner,et al.  Mutation-Tolerant Protein Identification by Mass Spectrometry , 2000, J. Comput. Biol..

[16]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[17]  Wen Gao,et al.  IndexToolkit: an open source toolbox to index protein databases for high-throughput proteomics , 2006, Bioinform..

[18]  Si-Min He,et al.  Preprocessing of Tandem Mass Spectrometric Data Based on Decision Tree Classification , 2016, Genomics, proteomics & bioinformatics.

[19]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[20]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[21]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[22]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[23]  Steven P Gygi,et al.  Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations , 2005, Nature Methods.

[24]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[25]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[26]  Wen Gao,et al.  Mining Tandem Mass Spectral Data to Develop a More Accurate Mass Error Model for Peptide Identification , 2007, Pacific Symposium on Biocomputing.

[27]  Wen Gao,et al.  Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry , 2004, Bioinform..

[28]  Bin Ma,et al.  Software for computational peptide identification from MS-MS data. , 2006, Drug discovery today.

[29]  R. Aebersold,et al.  ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data , 2002, Proteomics.

[30]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[31]  Michael J MacCoss,et al.  Computational analysis of shotgun proteomics data. , 2005, Current opinion in chemical biology.

[32]  Vineet Bafna,et al.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database , 2001, ISMB.

[33]  David L Tabb,et al.  DBDigger: reorganized proteomic database identification that improves flexibility and speed. , 2005, Analytical chemistry.

[34]  David Fenyö,et al.  RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database , 2002, Proteomics.