A survey of molecular descriptors used in mass spectrometry based proteomics.

The field of proteomics has grown vertiginously in the last years. This has been due fundamentally to technological improvements in the instrumentation, methods, and easy-to-use software, thereby making it possible to address a large number of biological questions and to deepen the study of the proteome of several organisms. The development in the field has imposed a challenge in the computational analysis of the commonly obtained large datasets generated in a single proteomics experiment, which still remains. An alternative to tackle this general issue has been the use of auxiliary information generated during the proteomics experiment to validate the confidence of the identifications. In this manuscript we review the main molecular descriptors used for building predictor models for estimating retention time, isoelectric point and peptide "detectability", which are key tools in the design of several validation strategies based in these criteria. We also give an overview of the main open source tools and libraries used for computing molecular descriptors.

[1]  Gordon A Anderson,et al.  Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. , 2003, Analytical chemistry.

[2]  F. Tian,et al.  Predicting liquid chromatographic retention times of peptides from the Drosophila melanogaster proteome by machine learning approaches. , 2009, Analytica chimica acta.

[3]  Ruedi Aebersold,et al.  Building and searching tandem mass (MS/MS) spectral libraries for peptide identification in proteomics. , 2011, Methods.

[4]  Lukas Käll,et al.  Training, selection, and robust calibration of retention time models for targeted proteomics. , 2010, Journal of proteome research.

[5]  Egon L. Willighagen,et al.  New developments on the cheminformatics open workflow environment CDK-Taverna , 2011, J. Cheminformatics.

[6]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[7]  Markus Müller,et al.  In silico analysis of accurate proteomics, complemented by selective isolation of peptides. , 2011, Journal of proteomics.

[8]  J. Eu,et al.  Calculation of the isoelectric point of tryptic peptides in the pH 3.5–4.5 range based on adjacent amino acid effects , 2008, Electrophoresis.

[9]  W. Lehmann,et al.  De novo sequencing of peptides by MS/MS , 2010, Proteomics.

[10]  Karl Mechtler,et al.  Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide MS/MS spectra and noise reduction , 2010, BMC Genomics.

[11]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[12]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[13]  Lennart Martens,et al.  Computational proteomics pitfalls and challenges: HavanaBioinfo 2012 workshop report. , 2013, Journal of proteomics.

[14]  Roman Kaliszan,et al.  Predictions of peptides' retention times in reversed‐phase liquid chromatography as a new supportive tool to improve protein identification in proteomics , 2009, Proteomics.

[15]  Ferenc Csizmadia JChem: Java Applets and Modules Supporting Chemical Database Handling from Web Browsers , 2000, J. Chem. Inf. Comput. Sci..

[16]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[17]  Robert D. Clark,et al.  SYBYL Line Notation (SLN): A Single Notation To Represent Chemical Structures, Queries, Reactions, and Virtual Libraries , 2008, J. Chem. Inf. Model..

[18]  A. Shevchenko,et al.  MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. , 2003, Analytical chemistry.

[19]  Y Vander Heyden,et al.  Review on modelling aspects in reversed-phase liquid chromatographic quantitative structure-retention relationships. , 2007, Analytica chimica acta.

[20]  Robertson Craig,et al.  The use of proteotypic peptide libraries for protein identification. , 2005, Rapid communications in mass spectrometry : RCM.

[21]  A. Nesvizhskii,et al.  Computational analysis of unassigned high‐quality MS/MS spectra in proteomic data sets , 2010, Proteomics.

[22]  D. Hochstrasser,et al.  The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences , 1993, Electrophoresis.

[23]  Dong-Sheng Cao,et al.  ChemoPy: freely available python package for computational biology and chemoinformatics , 2013, Bioinform..

[24]  Pier Giorgio Righetti,et al.  Determination of the isoelectric point of proteins by capillary isoelectric focusing. , 2004, Journal of chromatography. A.

[25]  R. Beavis,et al.  An Improved Model for Prediction of Retention Times of Tryptic Peptides in Ion Pair Reversed-phase HPLC , 2004, Molecular & Cellular Proteomics.

[26]  Tero Aittokallio,et al.  Filtering strategies for improving protein identification in high‐throughput MS/MS studies , 2009, Proteomics.

[27]  S. Lemeer,et al.  A versatile peptide pI calculator for phosphorylated and N‐terminal acetylated peptides experimentally tested using peptide isoelectric focusing , 2008, Proteomics.

[28]  G. Massolini,et al.  Correctness of Protein Identifications of Bacillus subtilis Proteome with the Indication on Potential False Positive Peptides Supported by Predictions of Their Retention Times , 2009, Journal of biomedicine & biotechnology.

[29]  Michael J MacCoss,et al.  Improving tandem mass spectrum identification using peptide retention time prediction across diverse chromatography conditions. , 2007, Analytical chemistry.

[30]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[31]  Markus Müller,et al.  Isoelectric point optimization using peptide descriptors and support vector machines. , 2012, Journal of proteomics.

[32]  Juan Antonio Vizcaíno,et al.  HI-bone: a scoring system for identifying phenylisothiocyanate-derivatized peptides based on precursor mass and high intensity fragment ions. , 2013, Analytical chemistry.

[33]  B. Cargile,et al.  An alternative to tandem mass spectrometry: isoelectric point and accurate mass for the identification of peptides. , 2004, Analytical chemistry.

[34]  James P. Reilly,et al.  A computational approach toward label-free protein quantification using predicted peptide detectability , 2006, ISMB.

[35]  R. Aebersold,et al.  Mass Spectrometry and Protein Analysis , 2006, Science.

[36]  Joseph M. Foster,et al.  Chromatographic retention time prediction for posttranslationally modified peptides , 2012, Proteomics.

[37]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[38]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[39]  M. Patankar,et al.  RT-SVR+q: a strategy for post-Mascot analysis using retention time and q value metric to improve peptide and protein identifications. , 2011, Journal of proteomics.

[40]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[41]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[42]  J. Yates,et al.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. , 2003, Analytical chemistry.

[43]  Richard D. Smith,et al.  The Utility of Accurate Mass and LC Elution Time Information in the Analysis of Complex Proteomes , 2005, Journal of the American Society for Mass Spectrometry.

[44]  Oliver Kohlbacher,et al.  Improving peptide identification in proteome analysis by a two-dimensional retention time filtering approach. , 2009, Journal of proteome research.

[45]  Ying Xu,et al.  Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information. , 2006, Analytical chemistry.