TOPP - the OpenMS proteomics pipeline

MOTIVATION Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In order to facilitate these studies, it would be desirable to have a flexible 'toolbox' of versatile and user-friendly applications allowing for rapid construction of computational workflows in proteomics. RESULTS We describe a set of tools for proteomics data analysis-TOPP, The OpenMS Proteomics Pipeline. TOPP provides a set of computational tools which can be easily combined into analysis pipelines even by non-experts and can be used in proteomics workflows. These applications range from useful utilities (file format conversion, peak picking) over wrapper applications for known applications (e.g. Mascot) to completely new algorithmic techniques for data reduction and data analysis. We anticipate that TOPP will greatly facilitate rapid prototyping of proteomics data evaluation pipelines. As such, we describe the basic concepts and the current abilities of TOPP and illustrate these concepts in the context of two example applications: the identification of peptides from a raw dataset through database search and the complex analysis of a standard addition experiment for the absolute quantitation of biomarkers. The latter example demonstrates TOPP's ability to construct flexible analysis pipelines in support of complex experimental setups. AVAILABILITY The TOPP components are available as open-source software under the lesser GNU public license (LGPL). Source code is available from the project website at www.OpenMS.de

[1]  Haim J. Wolfson,et al.  Geometric hashing: an overview , 1997 .

[2]  R. Aebersold,et al.  Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry , 2001, Nature Biotechnology.

[3]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[4]  Dr. Peter James Proteome Research: Mass Spectrometry , 2001, Principles and Practice.

[5]  Vineet Bafna,et al.  InsPecT : Fast and accurate identification of post-translationally modified peptides from tandem mass spectra , 2005 .

[6]  Matej Oresic,et al.  MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data , 2006, Bioinform..

[7]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[8]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[9]  Knut Reinert,et al.  Algorithms for the Automated Absolute Quantification of Diagnostic Markers in Complex Proteomics Samples , 2005, CompLife.

[10]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[11]  Jacob D. Jaffe,et al.  MapQuant: Open‐source software for large‐scale protein quantification , 2006, Proteomics.

[12]  Ruedi Aebersold,et al.  A Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry*S , 2005, Molecular & Cellular Proteomics.

[13]  Sue A. Olson,et al.  EMBOSS opens up sequence analysis. European Molecular Biology Open Software Suite. , 2002, Briefings in bioinformatics.

[14]  Knut Reinert,et al.  Absolute myoglobin quantitation in serum by combining two-dimensional liquid chromatography-electrospray ionization mass spectrometry and novel data analysis algorithms. , 2006, Journal of proteome research.

[15]  J. A. Taylor,et al.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. , 2001, Analytical chemistry.

[16]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[17]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[18]  Rolf Apweiler,et al.  The Proteomics Standards Initiative , 2003, Proteomics.

[19]  R. Aebersold,et al.  Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry. , 2003, Analytical chemistry.

[20]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[21]  Fredrik Levander,et al.  Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting , 2004, Bioinform..

[22]  Gordon A Anderson,et al.  Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. , 2003, Analytical chemistry.

[23]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[24]  Patrick G. A. Pedrioli,et al.  A tool to visualize and evaluate data obtained by liquid chromatography-electrospray ionization-mass spectrometry. , 2004, Analytical chemistry.

[25]  Knut Reinert,et al.  High-Accuracy Peak Picking of Proteomics Data Using Wavelet Techniques , 2005, Pacific Symposium on Biocomputing.

[26]  Jean-Charles Sanchez,et al.  MSight: An image analysis software for liquid chromatography‐mass spectrometry , 2005, Proteomics.

[27]  David L. Tabb,et al.  Protein Identification by SEQUEST , 2001 .

[28]  Sue A. Olson,et al.  Emboss opens up sequence analysis , 2002, Briefings Bioinform..

[29]  Chris F. Taylor,et al.  Autumn 2005 Workshop of the Human Proteome Organisation Proteomics Standards Initiative (HUPO‐PSI) Geneva, September, 4–6, 2005 , 2006, Proteomics.

[30]  Yongyi Mao,et al.  Informatics Platform for Global Proteomic Profiling and Biomarker Discovery Using Liquid Chromatography-Tandem Mass Spectrometry*S , 2004, Molecular & Cellular Proteomics.

[31]  Pierre Soille,et al.  Morphological Image Analysis , 1999 .

[32]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[33]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .