Overcoming Species Boundaries in Peptide Identification with BICEPS

Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. While sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides which are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS) and offer an open-source implementation based on this statistical criterion to

[1]  S. Salzberg,et al.  Genome Assembly Has a Major Impact on Gene Content: A Comparison of Annotation in Two Bos Taurus Assemblies , 2011, PloS one.

[2]  W. Pao,et al.  A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics* , 2011, Molecular & Cellular Proteomics.

[3]  Hanno Steen,et al.  Estimating the confidence of peptide identifications without decoy databases. , 2010, Analytical chemistry.

[4]  D. Tabb,et al.  TagRecon: high-throughput mutation identification through sequence tagging. , 2010, Journal of proteome research.

[5]  R. J. Beynon,et al.  Cross Species Proteomics , 2010, Proteome Bioinformatics.

[6]  Judith A J Steen,et al.  When less can yield more – Computational preprocessing of MS/MS spectra for peptide identification , 2009, Proteomics.

[7]  Bin Ma,et al.  Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy , 2009, Bioinform..

[8]  L. Cantley,et al.  Biomolecular Characterization and Protein Sequences of the Campanian Hadrosaur B. canadensis , 2009, Science.

[9]  A. Shevchenko,et al.  Tools for exploring the proteomosphere. , 2009, Journal of proteomics.

[10]  P. Pevzner,et al.  Spectral Profiles, a Novel Representation of Tandem Mass Spectra and Their Applications for De Novo Peptide Sequencing and Identification* □ S , 2022 .

[11]  P. Pevzner,et al.  Automated de novo protein sequencing of monoclonal antibodies , 2008, Nature Biotechnology.

[12]  Richard D. Smith,et al.  De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins. , 2008, Analytical chemistry.

[13]  John M. Asara,et al.  Response to Comment on "Protein Sequences from Mastodon and Tyrannosaurus rex Revealed by Mass Spectrometry" , 2008, Science.

[14]  P. Pevzner,et al.  Comment on "Protein Sequences from Mastodon and Tyrannosaurus rex Revealed by Mass Spectrometry" , 2008, Science.

[15]  A. Shevchenko,et al.  Protein identification pipeline for the homology-driven proteomics. , 2008, Journal of proteomics.

[16]  David L Tabb,et al.  DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. , 2008, Journal of proteome research.

[17]  Ronald J Moore,et al.  Proteome-wide identification of proteins and their modifications with decreased ambiguities and improved false discovery rates using unique sequence tags. , 2008, Analytical chemistry.

[18]  Christodoulos A. Floudas,et al.  A hybrid method for peptide identification using integer linear optimization, local database search, and quadrupole time-of-flight or OrbiTrap tandem mass spectrometry. , 2008, Journal of proteome research.

[19]  Leo C. McHugh,et al.  Computational Methods for Protein Identification from Mass Spectrometry Data , 2008, PLoS Comput. Biol..

[20]  J. Buhmann,et al.  A workflow to increase the detection rate of proteins from unsequenced organisms in high‐throughput proteomics experiments , 2007, Proteomics.

[21]  Joachim M. Buhmann,et al.  PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra , 2007, Bioinform..

[22]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[23]  Sean L Seymour,et al.  The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra*S , 2007, Molecular & Cellular Proteomics.

[24]  P. Pevzner,et al.  Sequence similarity‐driven proteomics in organisms with unknown genomes by LC‐MS/MS and automated de novo sequencing , 2007, Proteomics.

[25]  Gerald J Wyckoff,et al.  Virtual polymorphism: finding divergent peptide matches in mass spectrometry data. , 2007, Analytical chemistry.

[26]  David Goldberg,et al.  Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. , 2007, Analytical chemistry.

[27]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[28]  Patrice Waridel,et al.  Rapid validation of protein identifications with the borderline statistical confidence via de novo sequencing and MS BLAST searches. , 2006, Journal of proteome research.

[29]  Bo Yan,et al.  Peptide sequence tag-based blind identification of post-translational modifications with point process model , 2006, ISMB.

[30]  Daniel A. Schaeffer,et al.  Error‐tolerant EST database searches by tandem mass spectrometry and multiTag software , 2005, Proteomics.

[31]  Andrew S. Greene,et al.  DeNovoID: a web-based tool for identifying peptides from sequence and mass tags deduced from de novo peptide sequencing by mass spectroscopy , 2005, Nucleic Acids Res..

[32]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[33]  Bin Ma,et al.  SPIDER: software for protein identification from sequence tags with de novo sequencing error , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[34]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[35]  B. Searle,et al.  High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. , 2004, Analytical chemistry.

[36]  A. Shevchenko,et al.  The Power and the Limitations of Cross-Species Protein Identification by Mass Spectrometry-driven Sequence Similarity Searches*S , 2004, Molecular & Cellular Proteomics.

[37]  J. Yates,et al.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. , 2003, Analytical chemistry.

[38]  R. Beavis,et al.  A method for reducing the time required to match protein sequences with tandem mass spectra. , 2003, Rapid communications in mass spectrometry : RCM.

[39]  J. Yates,et al.  A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. , 2003, Analytical chemistry.

[40]  A. Shevchenko,et al.  MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. , 2003, Analytical chemistry.

[41]  D. Creasy,et al.  Error tolerant searching of uninterpreted tandem mass spectrometry data , 2002, Proteomics.

[42]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[43]  P. Bork,et al.  Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. , 2001, Analytical chemistry.

[44]  G. Petit,et al.  Litomosoides sigmodontis in mice: reappraisal of an old model for filarial research. , 2000, Parasitology today.

[45]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[46]  Mark L. Blaxter,et al.  A molecular evolutionary framework for the phylum Nematoda , 1998, Nature.

[47]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[48]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[49]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[50]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[51]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .