Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows.

Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines.

[1]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[2]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[3]  M. Mann,et al.  Analysis of proteins and proteomes by mass spectrometry. , 2001, Annual review of biochemistry.

[4]  J. Shabanowitz,et al.  Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Anantharaman Kalyanaraman,et al.  MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification , 2011, Bioinform..

[6]  Nuno Bandeira,et al.  False discovery rates in spectral identification , 2012, BMC Bioinformatics.

[7]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[8]  D. Fenyö,et al.  Proteogenomics from a bioinformatics angle: A growing field. , 2015, Mass spectrometry reviews.

[9]  Lennart Martens,et al.  A decoy-free approach to the identification of peptides. , 2015, Journal of proteome research.

[10]  Brendan K Faherty,et al.  MacroSEQUEST: efficient candidate-centric searching and high-resolution correlation analysis for large-scale proteomics data sets. , 2010, Analytical chemistry.

[11]  M. MacCoss,et al.  A fast SEQUEST cross correlation algorithm. , 2008, Journal of proteome research.

[12]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[13]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[14]  David Fenyö,et al.  Mass spectrometric protein identification using the global proteome machine. , 2010, Methods in molecular biology.

[15]  Allison Doerr DIA mass spectrometry , 2014, Nature Methods.

[16]  Lennart Martens,et al.  Crowdsourcing in proteomics: public resources lead to better experiments , 2013, Amino Acids.

[17]  Cesare Furlanello,et al.  Machine learning methods for predictive proteomics , 2007, Briefings Bioinform..

[18]  Christian Stolte,et al.  TB database: an integrated platform for tuberculosis research , 2008, Nucleic Acids Res..

[19]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[20]  Lennart Martens,et al.  Shedding light on black boxes in protein identification , 2014, Proteomics.

[21]  Pratik D Jagtap,et al.  Multi-omic data analysis using Galaxy , 2015, Nature Biotechnology.

[22]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[23]  William Stafford Noble,et al.  Tandem Mass Spectrum Identification via Cascaded Search , 2015, Journal of proteome research.

[24]  N. Ahn,et al.  Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies. , 2010, Journal of proteome research.

[25]  M. Mann,et al.  In-Vivo Quantitative Proteomics Reveals a Key Contribution of Post-Transcriptional Mechanisms to the Circadian Regulation of Liver Metabolism , 2014, PLoS genetics.

[26]  B. Searle,et al.  Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. , 2008, Journal of proteome research.

[27]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[28]  Alan Dove,et al.  Proteomics: translating genomics into products? , 1999, Nature Biotechnology.

[29]  Lennart Martens,et al.  Introduction to opportunities and pitfalls in functional mass spectrometry based proteomics. , 2014, Biochimica et biophysica acta.

[30]  Bin Ma,et al.  PEAKS DB: De Novo Sequencing Assisted Database Search for Sensitive and Accurate Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[31]  K. Medzihradszky,et al.  Lessons in de novo peptide sequencing by tandem mass spectrometry. , 2015, Mass spectrometry reviews.

[32]  D. Benndorf,et al.  Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. , 2013, Molecular bioSystems.

[33]  Thilo Muth,et al.  Navigating through metaproteomics data: A logbook of database searching , 2015, Proteomics.

[34]  Knut Reinert,et al.  High-Accuracy Peak Picking of Proteomics Data Using Wavelet Techniques , 2005, Pacific Symposium on Biocomputing.

[35]  J. Vandesompele,et al.  An update on LNCipedia: a database for annotated human lncRNA sequences , 2015, Nucleic Acids Res..

[36]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[37]  Vineet Bafna,et al.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database , 2001, ISMB.

[38]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[39]  Monte Westerfield,et al.  ZFIN, the Zebrafish Model Organism Database: increased support for mutants and transgenics , 2012, Nucleic Acids Res..

[40]  John Skilling,et al.  ProbSeq—A Fragmentation Model for Interpretation of Electrospray Tandem Mass Spectrometry Data , 2004, Comparative and functional genomics.

[41]  M. Mann,et al.  Electrospray ionization for mass spectrometry of large biomolecules. , 1989, Science.

[42]  Stephen R Master,et al.  Unbiased statistical analysis for multi-stage proteomic search strategies. , 2010, Journal of proteome research.

[43]  Yong J. Kil,et al.  Byonic: Advanced Peptide and Protein Identification Software , 2012, Current protocols in bioinformatics.

[44]  Yasset Perez-Riverol,et al.  Making proteomics data accessible and reusable: Current state of proteomics databases and repositories , 2015, Proteomics.

[45]  M. Mann,et al.  Higher-energy C-trap dissociation for peptide modification analysis , 2007, Nature Methods.

[46]  Chen Li,et al.  Preprocessing Significantly Improves the Peptide/Protein Identification Sensitivity of High-resolution Isobarically Labeled Tandem Mass Spectrometry Data* , 2014, Molecular & Cellular Proteomics.

[47]  W. Lehmann,et al.  De novo sequencing of peptides by MS/MS , 2010, Proteomics.

[48]  Neil L. Kelleher,et al.  Peer Reviewed: Top-Down Proteomics , 2004 .

[49]  Peter R Baker,et al.  Finding Chimeras: a Bioinformatics Strategy for Identification of Cross-linked Peptides* , 2009, Molecular & Cellular Proteomics.

[50]  Stefan Tenzer,et al.  In‐depth evaluation of software tools for data‐independent acquisition based label‐free quantification , 2015, Proteomics.

[51]  Sean L Seymour,et al.  The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra*S , 2007, Molecular & Cellular Proteomics.

[52]  Judith A J Steen,et al.  When less can yield more – Computational preprocessing of MS/MS spectra for peptide identification , 2009, Proteomics.

[53]  Kara Dolinski,et al.  Gene Ontology annotations at SGD: new data sources and annotation methods , 2007, Nucleic Acids Res..

[54]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[55]  David Goldberg,et al.  Reanalysis of Tyrannosaurus rex Mass Spectra. , 2009, Journal of proteome research.

[56]  Jacob D. Jaffe,et al.  Proteogenomic mapping as a complementary method to perform genome annotation , 2004, Proteomics.

[57]  R. Zahedi,et al.  Peptide identification quality control , 2011, Proteomics.

[58]  Jonathan Goya,et al.  SQID: an intensity-incorporated protein identification algorithm for tandem mass spectrometry. , 2011, Journal of proteome research.

[59]  Yong J. Kil,et al.  Comment on "Unbiased statistical analysis for multi-stage proteomic search strategies". , 2011, Journal of proteome research.

[60]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[61]  Luis Mendoza,et al.  Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline* , 2014, Molecular & Cellular Proteomics.

[62]  Ø. Bruserud,et al.  Performance of super‐SILAC based quantitative proteomics for comparison of different acute myeloid leukemia (AML) cell lines , 2014, Proteomics.

[63]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.

[64]  Leo C. McHugh,et al.  Computational Methods for Protein Identification from Mass Spectrometry Data , 2008, PLoS Comput. Biol..

[65]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2015, Nucleic Acids Res..

[66]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[67]  Frank Suits,et al.  A noise model for mass spectrometry based proteomics , 2008, Bioinform..

[68]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[69]  P. Roepstorff,et al.  Proposal for a common nomenclature for sequence ions in mass spectra of peptides. , 1984, Biomedical mass spectrometry.

[70]  Yi-Kuo Yu,et al.  RAId_DbS: mass-spectrometry based peptide identification web server with knowledge integration , 2008, BMC Genomics.

[71]  K. Gevaert,et al.  A stringent approach to improve the quality of nitrotyrosine peptide identifications , 2011, Proteomics.

[72]  Felipe Maia Galvão França,et al.  Effectively addressing complex proteomic search spaces with peptide spectrum matching , 2013, Bioinform..

[73]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[74]  Pavel A. Pevzner,et al.  Spectral Archives: Extending Spectral Libraries to Analyze both Identified and Unidentified Spectra , 2011, Nature Methods.

[75]  Rune Matthiesen,et al.  Interpreting peptide mass spectra by VEMS , 2003, Bioinform..

[76]  J. Coon,et al.  A proteomics search algorithm specifically designed for high-resolution tandem mass spectra. , 2013, Journal of proteome research.

[77]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[78]  R. Aebersold,et al.  ProbIDtree: An automated software program capable of identifying multiple peptides from a single collision‐induced dissociation spectrum collected by a tandem mass spectrometer , 2005, Proteomics.

[79]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[80]  Matthias Mann,et al.  Analysis of High Accuracy, Quantitative Proteomics Data in the MaxQB Database , 2012, Molecular & Cellular Proteomics.

[81]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[82]  Rovshan G Sadygov,et al.  Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book , 2004, Nature Methods.

[83]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[84]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[85]  R. Zahedi,et al.  Why phosphoproteomics is still a challenge. , 2015, Molecular bioSystems.

[86]  David Fenyö,et al.  RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database , 2002, Proteomics.

[87]  Pavel A. Pevzner,et al.  Protein identification by spectral networks analysis , 2007, Proceedings of the National Academy of Sciences.

[88]  D. Creasy,et al.  Error tolerant searching of uninterpreted tandem mass spectrometry data , 2002, Proteomics.

[89]  Albert Sickmann,et al.  Phosphoproteomics—More than meets the eye , 2013, Electrophoresis.

[90]  Lennart Martens,et al.  sORFs.org: a repository of small ORFs identified by ribosome profiling , 2015, Nucleic Acids Res..

[91]  Michael A. Freitas,et al.  MassMatrix: A database search program for rapid characterization of proteins and peptides from tandem mass spectrometry data , 2009, Proteomics.

[92]  Krzysztof J Cios,et al.  Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra. , 2006, Analytical chemistry.

[93]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[94]  Lennart Martens,et al.  SearchGUI: An open‐source graphical user interface for simultaneous OMSSA and X!Tandem searches , 2011, Proteomics.

[95]  Lennart Martens,et al.  Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories , 2005, Proteomics.

[96]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[97]  P. Tessari,et al.  High Abundance Proteins Depletion vs Low Abundance Proteins Enrichment: Comparison of Methods to Reduce the Plasma Proteome Complexity , 2011, PloS one.

[98]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[99]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[100]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[101]  Toshihisa Takagi,et al.  DNA data bank of Japan (DDBJ) progress report , 2015, Nucleic Acids Res..

[102]  Eric W. Deutsch,et al.  Combining Results of Multiple Search Engines in Proteomics* , 2013, Molecular & Cellular Proteomics.

[103]  Jim Thurmond,et al.  FlyBase 101 – the basics of navigating FlyBase , 2011, Nucleic Acids Res..

[104]  Sándor Suhai,et al.  Fragmentation pathways of protonated peptides. , 2005, Mass spectrometry reviews.

[105]  R S Johnson,et al.  Novel fragmentation process of peptides by collision-induced decomposition in a tandem mass spectrometer: differentiation of leucine and isoleucine. , 1987, Analytical chemistry.

[106]  R. Aebersold,et al.  Western Blots versus Selected Reaction Monitoring Assays: Time to Turn the Tables? , 2013, Molecular & Cellular Proteomics.

[107]  Lennart Martens,et al.  Distributed computing and data storage in proteomics: Many hands make light work, and a stronger memory , 2014, Proteomics.

[108]  Eunok Paek,et al.  Fast Multi-blind Modification Search through Tandem Mass Spectrometry* , 2011, Molecular & Cellular Proteomics.

[109]  Edward L. Huttlin,et al.  An ultra-tolerant database search reveals that a myriad of modified peptides contributes to unassigned spectra in shotgun proteomics , 2015, Nature Biotechnology.

[110]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[111]  Kimberly Van Auken,et al.  WormBase 2014: new views of curated biology , 2013, Nucleic Acids Res..

[112]  David L Tabb,et al.  DBDigger: reorganized proteomic database identification that improves flexibility and speed. , 2005, Analytical chemistry.

[113]  Lennart Martens,et al.  ProteoCloud: a full-featured open source proteomics cloud computing pipeline. , 2013, Journal of proteomics.

[114]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[115]  Amit Kumar Yadav,et al.  MassWiz: a novel scoring algorithm with target-decoy based analysis pipeline for tandem mass spectrometry. , 2011, Journal of proteome research.

[116]  Michael D. Litton,et al.  IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. , 2009, Journal of proteome research.

[117]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[118]  Ying Zhang,et al.  The neXtProt knowledgebase on human proteins: current status , 2014, Nucleic Acids Res..

[119]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[120]  Koichi Tanaka,et al.  Protein and polymer analyses up to m/z 100 000 by laser ionization time-of-flight mass spectrometry , 1988 .

[121]  J. Yates,et al.  A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. , 2003, Analytical chemistry.

[122]  Leo E Bonilla,et al.  Maximizing the sensitivity and reliability of peptide identification in large‐scale proteomic experiments by harnessing multiple search engines , 2010, Proteomics.

[123]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[124]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[125]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[126]  David E James,et al.  Re-fraction: a machine learning approach for deterministic identification of protein homologues and splice variants in large-scale MS-based proteomics. , 2012, Journal of proteome research.

[127]  D. Tabb,et al.  TagRecon: high-throughput mutation identification through sequence tagging. , 2010, Journal of proteome research.

[128]  J. Vizcaíno,et al.  Exploring the potential of public proteomics data , 2015, Proteomics.

[129]  R. Eils,et al.  Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. , 2014, Cell stem cell.

[130]  Jeff A. Bilmes,et al.  Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification , 2008, ISMB.

[131]  Michael J MacCoss,et al.  Comparison of database search strategies for high precursor mass accuracy MS/MS data. , 2010, Journal of proteome research.

[132]  R. Aebersold,et al.  ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data , 2002, Proteomics.

[133]  Wen Gao,et al.  pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry. , 2007, Rapid communications in mass spectrometry : RCM.

[134]  Lennart Martens,et al.  Current methods for global proteome identification , 2012, Expert review of proteomics.

[135]  M. Mann,et al.  Andromeda: a peptide search engine integrated into the MaxQuant environment. , 2011, Journal of proteome research.

[136]  Eric W. Deutsch,et al.  File Formats Commonly Used in Mass Spectrometry Proteomics* , 2012, Molecular & Cellular Proteomics.

[137]  Jürgen Cox,et al.  Expert System for Computer-assisted Annotation of MS/MS Spectra* , 2012, Molecular & Cellular Proteomics.

[138]  M. Karas,et al.  Influence of the wavelength in high-irradiance ultraviolet laser desorption mass spectrometry of organic molecules , 1985 .

[139]  Lennart Martens,et al.  Machine learning applications in proteomics research: How the past can boost the future , 2014, Proteomics.

[140]  Jeffrey A Milloy,et al.  Tempest: GPU-CPU computing for high-throughput database spectral matching. , 2012, Journal of proteome research.

[141]  Leonard J. Foster,et al.  Interpretation of Data Underlying the Link Between Colony Collapse Disorder (CCD) and an Invertebrate Iridescent Virus , 2011, Molecular & Cellular Proteomics.

[142]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[143]  Hon Wai Leong,et al.  Algorithm for peptide sequencing by tandem mass spectrometry based on better preprocessing and anti-symmetric computational model. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[144]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[145]  M. Savitski,et al.  Extent of Modifications in Human Proteome Samples and Their Effect on Dynamic Range of Analysis in Shotgun Proteomics*S , 2006, Molecular & Cellular Proteomics.

[146]  Guy Perrière,et al.  Databases of homologous gene families for comparative genomics , 2009, BMC Bioinformatics.

[147]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[148]  David L. Tabb,et al.  Wavelet-Based Peak Detection and a New Charge Inference Procedure for MS/MS Implemented in ProteoWizard’s msConvert , 2014, Journal of proteome research.

[149]  William Stafford Noble Mass spectrometrists should search only for peptides they care about , 2015, Nature Methods.

[150]  Paolo Magni,et al.  Accurate peak list extraction from proteomic mass spectra for identification and profiling studies , 2010, BMC Bioinformatics.

[151]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[152]  Lennart Martens,et al.  Pladipus Enables Universal Distributed Computing in Proteomics Bioinformatics. , 2016, Journal of proteome research.

[153]  Jens Allmer,et al.  Algorithms for the de novo sequencing of peptides from tandem mass spectra , 2011, Expert review of proteomics.

[154]  K. Dreisewerd,et al.  Effect of gas pressure and gas type on the fragmentation of peptide and oligosaccharide ions generated in an elevated pressure UV/IR-MALDI ion source coupled to an orthogonal time-of-flight mass spectrometer. , 2009, Analytical chemistry.

[155]  Matthias Selbach,et al.  Systematic Errors in Peptide and Protein Identification and Quantification by Modified Peptides* , 2016, Molecular & Cellular Proteomics.

[156]  Marco Y. Hein,et al.  A Human Interactome in Three Quantitative Dimensions Organized by Stoichiometries and Abundances , 2015, Cell.

[157]  Stephan M. Winkler,et al.  MS Amanda, a Universal Identification Algorithm Optimized for High Accuracy Tandem Mass Spectra , 2014, Journal of proteome research.

[158]  Wen Gao,et al.  pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry , 2005, Bioinform..

[159]  S. A. McLuckey,et al.  Collision-induced dissociation (CID) of peptides and proteins. , 2005, Methods in enzymology.

[160]  Rashmi Pant,et al.  The Pathogen-Host Interactions database (PHI-base): additions and future developments , 2014, Nucleic Acids Res..

[161]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[162]  Rovshan G Sadygov,et al.  A new probabilistic database search algorithm for ETD spectra. , 2009, Journal of proteome research.

[163]  Lennart Martens,et al.  Analysis of the resolution limitations of peptide identification algorithms. , 2011, Journal of proteome research.

[164]  M. Mann,et al.  More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC-MS/MS. , 2011, Journal of proteome research.

[165]  Steven P Gygi,et al.  A probability-based approach for high-throughput protein phosphorylation analysis and site localization , 2006, Nature Biotechnology.

[166]  William Stafford Noble,et al.  Faster SEQUEST searching for peptide identification from tandem mass spectra. , 2011, Journal of proteome research.

[167]  Brian D Halligan,et al.  Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms. , 2009, Journal of proteome research.

[168]  Hamid Mirzaei,et al.  Cloud CPFP: a shotgun proteomics data analysis pipeline using cloud and high performance computing. , 2012, Journal of proteome research.

[169]  Robert J. Chalkley,et al.  The Effect of Using an Inappropriate Protein Database for Proteomic Data Analysis , 2011, PloS one.