Connecting X! Tandem to a Database Management System

Protein identification by mass spectrometry is a valuable method in the field of proteomics and metaproteomics. For protein identification, different protein search engines are used such as X! Tandem, MASCOT, OMSSA, SEQUEST etc. These search engines receive input data in form of files. With the rapid rise of proteomics and metaproteomics, new measurement devices are introduced resulting in increase of research capabilities, consequently producing enormous chunks of data regularly. Admittedly, file-based search engines for protein identification are at their limits and IT methods should be introduced for protein identification to manage huge amount of data efficiently in future. In this paper, we focus on feasibility of Database Management Systems as an alternative to conventional file-based approaches. We implement a connector interface and integrate it into the latest X! Tandem version (2017.02.01) , in order to couple it with a DBMS keeping its business logic intact and study its performance. We compared our work with the core X! Tandem and MetaProteomeAnalyzer tool (which performs protein search and uses a relational database for data storage). We observed there was no information loss in our approach and we were able to successfully implement the DBMS connector interface to X! Tandem.

[1]  David Fenyö,et al.  The Biopolymer Markup Language , 1999, Bioinform..

[2]  Gunter Saake,et al.  Datenbanken: Konzepte und Sprachen, 3. Auflage , 2008 .

[3]  Robert Heyer,et al.  Challenges and perspectives of metaproteomic data analysis. , 2017, Journal of biotechnology.

[4]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[5]  David Fenyö,et al.  RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database , 2002, Proteomics.

[6]  Peter Z. Kunszt,et al.  Using synthetic peptides to benchmark peptide identification software and search parameters for MS/MS data analysis , 2014 .

[7]  Kei-Hoi Cheung,et al.  X!!Tandem, an improved method for running X!tandem in parallel on collections of commodity computers. , 2008, Journal of proteome research.

[8]  Gunter Saake,et al.  Objektrelationale Datenbanken - ein Lehrbuch , 2005 .

[9]  David Fenyö,et al.  Modeling mass spectrometry-based protein analysis. , 2011, Methods in molecular biology.

[10]  R. Heyer,et al.  The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation. , 2015, Journal of proteome research.

[11]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[12]  Kenli Li,et al.  MIC-Tandem: Parallel X!Tandem Using MIC on Tandem Mass Spectrometry Based Proteomics Data , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[13]  Matthias Mann,et al.  NOPdb: Nucleolar Proteome Database , 2005, Nucleic Acids Res..

[14]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[15]  Lev I Levitsky,et al.  Adaptation of Decoy Fusion Strategy for Existing Multi-Stage Search Workflows , 2016, Journal of The American Society for Mass Spectrometry.

[16]  Robert Heyer,et al.  Interactive Chord Visualization for Metaproteomics , 2017, 2017 28th International Workshop on Database and Expert Systems Applications (DEXA).

[17]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[18]  Stephen R Master,et al.  Unbiased statistical analysis for multi-stage proteomic search strategies. , 2010, Journal of proteome research.

[19]  Robert Heyer,et al.  Metaproteomics of complex microbial communities in biogas plants , 2015, Microbial biotechnology.

[20]  B. Balgley,et al.  Comparative Evaluation of Tandem MS Search Algorithms Using a Target-Decoy Search Strategy*S , 2007, Molecular & Cellular Proteomics.

[21]  Kebing Yu,et al.  PeptideDepot: Flexible relational database for visual analysis of quantitative proteomic data and integration of existing protein information , 2009, Proteomics.

[22]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[23]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[24]  Ruedi Aebersold,et al.  The pros and cons of peptide-centric proteomics , 2010, Nature Biotechnology.