2DB: a Proteomics database for storage, analysis, presentation, and retrieval of information from mass spectrometric experiments

BackgroundThe amount of information stemming from proteomics experiments involving (multi dimensional) separation techniques, mass spectrometric analysis, and computational analysis is ever-increasing. Data from such an experimental workflow needs to be captured, related and analyzed. Biological experiments within this scope produce heterogenic data ranging from pictures of one or two-dimensional protein maps and spectra recorded by tandem mass spectrometry to text-based identifications made by algorithms which analyze these spectra. Additionally, peptide and corresponding protein information needs to be displayed.ResultsIn order to handle the large amount of data from computational processing of mass spectrometric experiments, automatic import scripts are available and the necessity for manual input to the database has been minimized. Information is in a generic format which abstracts from specific software tools typically used in such an experimental workflow. The software is therefore capable of storing and cross analysing results from many algorithms. A novel feature and a focus of this database is to facilitate protein identification by using peptides identified from mass spectrometry and link this information directly to respective protein maps. Additionally, our application employs spectral counting for quantitative presentation of the data. All information can be linked to hot spots on images to place the results into an experimental context. A summary of identified proteins, containing all relevant information per hot spot, is automatically generated, usually upon either a change in the underlying protein models or due to newly imported identifications. The supporting information for this report can be accessed in multiple ways using the user interface provided by the application.ConclusionWe present a proteomics database which aims to greatly reduce evaluation time of results from mass spectrometric experiments and enhance result quality by allowing consistent data handling. Import functionality, automatic protein detection, and summary creation act together to facilitate data analysis. In addition, supporting information for these findings is readily accessible via the graphical user interface provided. The database schema and the implementation, which can easily be installed on virtually any server, can be downloaded in the form of a compressed file from our project webpage.

[1]  T. Veenstra,et al.  What to do with “one‐hit wonders”? , 2004, Electrophoresis.

[2]  Daniela Bartels,et al.  Bioinformatics support for high-throughput proteomics. , 2003, Journal of biotechnology.

[3]  Ying Xu,et al.  A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with SEQUEST , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[4]  Eugene A. Kapp,et al.  Mining a tandem mass spectrometry database to determine the trends and global factors influencing peptide fragmentation. , 2003, Analytical chemistry.

[5]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[6]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[7]  G. Anderson,et al.  Probing proteomes using capillary isoelectric focusing-electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry. , 1999, Analytical chemistry.

[8]  Aubrey E. Hill,et al.  The UAB Proteomics Database , 2003, Bioinform..

[9]  Kenichi Higo,et al.  Rice Proteome Database based on two-dimensional polyacrylamide gel electrophoresis: its status in 2003 , 2004, Nucleic Acids Res..

[10]  Doubletree Hotel San Jose,et al.  The World's Most Popular Open Source Database , 2003 .

[11]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[12]  J. Yates,et al.  Direct analysis of protein complexes using mass spectrometry , 1999, Nature Biotechnology.

[13]  Andrew J Link,et al.  Parallel tandem: a program for parallel processing of tandem mass spectra using PVM or MPI and X!Tandem. , 2005, Journal of proteome research.

[14]  Feng Li,et al.  Construction of a nasopharyngeal carcinoma 2D/MS repository with Open Source XML Database – Xindice , 2006, BMC Bioinformatics.

[15]  Alejandro Heredia-Langner,et al.  Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data. , 2005, Journal of proteome research.

[16]  M. Mann,et al.  Analysis of proteins and proteomes by mass spectrometry. , 2001, Annual review of biochemistry.

[17]  Michel Vaubourdolle,et al.  PHProteomicDB: A Module for Two-dimensional Gel Electrophoresis Database Creation on Personal Web Sites , 2006, Genom. Proteom. Bioinform..

[18]  Alistair J. P. Brown,et al.  PEDRo: A database for storing, searching and disseminating experimental proteomics data , 2004, BMC Genomics.

[19]  S. Gygi,et al.  Proteomics: the move to mixtures. , 2001, Journal of mass spectrometry : JMS.

[20]  Michael P Washburn,et al.  Utilisation of proteomics datasets generated via multidimensional protein identification technology (MudPIT). , 2004, Briefings in functional genomics & proteomics.

[21]  Donald D. Chamberlin,et al.  SEQUEL: A structured English query language , 1974, SIGFIDET '74.

[22]  José M. Vidal,et al.  Cascading style sheets , 1997, World Wide Web J..

[23]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[24]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[25]  T. Rabilloud Two‐dimensional gel electrophoresis in proteomics: Old, old fashioned, but it still climbs up the mountains , 2002, Proteomics.

[26]  Olivier Golaz,et al.  Federated two‐dimensional electrophoresis database: A simple means of publishing two‐dimensional electrophoresis data , 1996, Electrophoresis.

[27]  Erik K. Malm,et al.  A Human Protein Atlas for Normal and Cancer Tissues Based on Antibody Proteomics* , 2005, Molecular & Cellular Proteomics.

[28]  Conrad Bessant,et al.  Protein and peptide identification algorithms using MS for use in high‐throughput, automated pipelines , 2005, Proteomics.

[29]  Hiraku Morisawa,et al.  Development of an open source laboratory information management system for 2-D gel electrophoresis-based proteomics workflow , 2006, BMC Bioinformatics.

[30]  M. Hippler,et al.  Mass spectrometric genomic data mining: Novel insights into bioenergetic pathways in Chlamydomonas reinhardtii , 2006, Proteomics.

[31]  G. Babnigg,et al.  ProteomeWeb: A web‐based interface for the display and interrogation of proteomes , 2003, Proteomics.

[32]  Cath Brooksbank,et al.  Tumour suppressors: One-hit wonders? , 2001, Nature Reviews Cancer.

[33]  Karl Mechtler,et al.  MASPECTRAS: a platform for management and analysis of proteomics LC-MS/MS data , 2007, BMC Bioinformatics.

[34]  Ron D. Appel,et al.  The SWISS-2DPAGE database of two-dimensional polyacrylamide gel electrophoresis, its status in 1995 , 1996, Nucleic Acids Res..