Assembling proteomics data as a prerequisite for the analysis of large scale experiments

BackgroundDespite the complete determination of the genome sequence of a huge number of bacteria, their proteomes remain relatively poorly defined. Beside new methods to increase the number of identified proteins new database applications are necessary to store and present results of large- scale proteomics experiments.ResultsIn the present study, a database concept has been developed to address these issues and to offer complete information via a web interface. In our concept, the Oracle based data repository system SQL-LIMS plays the central role in the proteomics workflow and was applied to the proteomes of Mycobacterium tuberculosis, Helicobacter pylori, Salmonella typhimurium and protein complexes such as 20S proteasome. Technical operations of our proteomics labs were used as the standard for SQL-LIMS template creation. By means of a Java based data parser, post-processed data of different approaches, such as LC/ESI-MS, MALDI-MS and 2-D gel electrophoresis (2-DE), were stored in SQL-LIMS. A minimum set of the proteomics data were transferred in our public 2D-PAGE database using a Java based interface (Data Transfer Tool) with the requirements of the PEDRo standardization. Furthermore, the stored proteomics data were extractable out of SQL-LIMS via XML.ConclusionThe Oracle based data repository system SQL-LIMS played the central role in the proteomics workflow concept. Technical operations of our proteomics labs were used as standards for SQL-LIMS templates. Using a Java based parser, post-processed data of different approaches such as LC/ESI-MS, MALDI-MS and 1-DE and 2-DE were stored in SQL-LIMS. Thus, unique data formats of different instruments were unified and stored in SQL-LIMS tables. Moreover, a unique submission identifier allowed fast access to all experimental data. This was the main advantage compared to multi software solutions, especially if personnel fluctuations are high. Moreover, large scale and high-throughput experiments must be managed in a comprehensive repository system such as SQL-LIMS, to query results in a systematic manner. On the other hand, these database systems are expensive and require at least one full time administrator and specialized lab manager. Moreover, the high technical dynamics in proteomics may cause problems to adjust new data formats. To summarize, SQL-LIMS met the requirements of proteomics data handling especially in skilled processes such as gel-electrophoresis or mass spectrometry and fulfilled the PSI standardization criteria. The data transfer into a public domain via DTT facilitated validation of proteomics data. Additionally, evaluation of mass spectra by post-processing using MS-Screener improved the reliability of mass analysis and prevented storage of data junk.

[1]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[2]  Frank Schmidt,et al.  Iterative data analysis is the key for exhaustive analysis of peptide mass fingerprints from proteins separated by two-dimensional electrophoresis , 2003, Journal of the American Society for Mass Spectrometry.

[3]  S. Gygi,et al.  Quantitative analysis of complex protein mixtures using isotope-coded affinity tags , 1999, Nature Biotechnology.

[4]  Joachim Klose,et al.  Analysis of two‐dimensional electrophoretic protein patterns using a video camera and a computer. II. Adaptation of automatic spot detection to visual evaluation , 1987 .

[5]  Jari Häkkinen,et al.  PROTEIOS: an open source proteomics initiative , 2005, Bioinform..

[6]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[7]  K. Parker,et al.  Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents*S , 2004, Molecular & Cellular Proteomics.

[8]  PT Corbett,et al.  Using chemical structure in open-source chemical text mining , 2008, Chemistry Central Journal.

[9]  Hiraku Morisawa,et al.  Development of an open source laboratory information management system for 2-D gel electrophoresis-based proteomics workflow , 2006, BMC Bioinformatics.

[10]  M. Karas,et al.  Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. , 1988, Analytical chemistry.

[11]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[12]  Frank Schmidt,et al.  Web‐accessible proteome databases for microbial research , 2004, Proteomics.

[13]  Ruedi Aebersold,et al.  Quantitative Protein Profiling Using Two-dimensional Gel Electrophoresis, Isotope-coded Affinity Tag Labeling, and Mass Spectrometry* , 2002, Molecular & Cellular Proteomics.

[14]  Rolf Apweiler,et al.  The speciation of the proteome , 2008, Chemistry Central journal.

[15]  M. Mann,et al.  Electrospray ionization for mass spectrometry of large biomolecules. , 1989, Science.

[16]  Bernd Thiede,et al.  Comprehensive quantitative proteome analysis of 20S proteasome subtypes from rat liver by isotope coded affinity tag and 2‐D gel‐based approaches , 2006, Proteomics.

[17]  John B. Fenn,et al.  Electrospray ionization–principles and practice , 1990 .

[18]  J. Klose Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues , 1975, Humangenetik.

[19]  Chris F. Taylor,et al.  A systematic approach to modeling, capturing, and disseminating proteomics experimental data , 2003, Nature Biotechnology.

[20]  E. Deutsch mzML: A single, unifying data format for mass spectrometer output , 2008, Proteomics.

[21]  Bernd Thiede,et al.  Peptide mass fingerprinting. , 2005, Methods.

[22]  Ruedi Aebersold,et al.  Complementary Analysis of the Mycobacterium tuberculosis Proteome by Two-dimensional Electrophoresis and Isotope-coded Affinity Tag Technology * , 2004, Molecular & Cellular Proteomics.

[23]  Lennart Martens,et al.  6th HUPO Annual World Congress – Proteomics Standards Initiative Workshop 6–10 October 2007, Seoul, Korea , 2008, Proteomics.

[24]  P M Kloetzel,et al.  Different proteasome subtypes in a single tissue exhibit different enzymatic properties. , 2000, Journal of molecular biology.

[25]  Koichi Tanaka,et al.  Protein and polymer analyses up to m/z 100 000 by laser ionization time-of-flight mass spectrometry , 1988 .

[26]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[27]  Christine Piggee LIMS and the art of MS proteomics. , 2008, Analytical chemistry.