The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics data

On behalf of The Human Proteome Organization (HUPO) Proteomics Standards Initiative, we introduce here two novel standard data formats, proBAM and proBed, that have been developed to address the current challenges of integrating mass spectrometry-based proteomics data with genomics and transcriptomics information in proteogenomics studies. proBAM and proBed are adaptations of the well-defined, widely used file formats SAM/BAM and BED, respectively, and both have been extended to meet the specific requirements entailed by proteomics data. Therefore, existing popular genomics tools such as SAMtools and Bedtools, and several widely used genome browsers, can already be used to manipulate and visualize these formats “out-of-the-box.” We also highlight that a number of specific additional software tools, properly supporting the proteomics information available in these formats, are now available providing functionalities such as file generation, file conversion, and data analysis. All the related documentation, including the detailed file format specifications and example files, are accessible at http://www.psidev.info/probam and at http://www.psidev.info/probed.

[1]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[2]  Lennart Martens,et al.  The PSI formal document process and its implementation on the PSI website , 2007, Proteomics.

[3]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[4]  David L. Tabb,et al.  proBAMsuite, a Bioinformatics Framework for Genome-Based Representation and Analysis of Proteomics Data* , 2015, Molecular & Cellular Proteomics.

[5]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[6]  Martin Eisenacher,et al.  Development of data representation standards by the human proteome organization proteomics standards initiative , 2015, J. Am. Medical Informatics Assoc..

[7]  Bing Zhang,et al.  Protein identification using customized protein sequence databases derived from RNA-Seq data. , 2012, Journal of proteome research.

[8]  Martin Eisenacher,et al.  The mzQuantML Data Standard for Mass Spectrometry–based Quantitative Studies in Proteomics , 2013, Molecular & Cellular Proteomics.

[9]  A. Quinlan BEDTools: The Swiss‐Army Tool for Genome Feature Analysis , 2014, Current protocols in bioinformatics.

[10]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[11]  Juan Antonio Vizcaíno,et al.  ms-data-core-api: an open-source, metadata-oriented library for computational proteomics , 2015, Bioinform..

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  Ronald J. Moore,et al.  Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer , 2016, Cell.

[14]  Samuel H Payne,et al.  Methods, Tools and Current Perspectives in Proteogenomics * , 2017, Molecular & Cellular Proteomics.

[15]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[16]  Masaki Matsumoto,et al.  jPOSTrepo: an international standard data repository for proteomes , 2016, Nucleic Acids Res..

[17]  Gerben Menschaert,et al.  proBAMconvert: A Conversion Tool for proBAM/proBed. , 2017, Journal of proteome research.

[18]  D. Fenyö,et al.  Proteogenomics from a bioinformatics angle: A growing field. , 2015, Mass spectrometry reviews.

[19]  Lennart Martens,et al.  A community proposal to integrate proteomics activities in ELIXIR , 2017, F1000Research.

[20]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[21]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2016, Nucleic Acids Res..

[22]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[23]  Galt P. Barber,et al.  BigWig and BigBed: enabling browsing of large distributed datasets , 2010, Bioinform..

[24]  F. Suits,et al.  Proteogenomics: Key Driver for Clinical Discovery and Personalized Medicine. , 2016, Advances in experimental medicine and biology.

[25]  J. Vandesompele,et al.  An update on LNCipedia: a database for annotated human lncRNA sequences , 2015, Nucleic Acids Res..

[26]  Martin Eisenacher,et al.  Proteomics Standards Initiative: Fifteen Years of Progress and Future Work , 2017, Journal of proteome research.

[27]  Martin Eisenacher,et al.  The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary , 2013, Database J. Biol. Databases Curation.

[28]  Alessandro Vullo,et al.  Ensembl 2017 , 2016, Nucleic Acids Res..

[29]  W. Van Criekinge,et al.  PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration , 2014, Nucleic acids research.

[30]  L. Stein,et al.  JBrowse: a next-generation genome browser. , 2009, Genome research.

[31]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[32]  J. Vizcaíno,et al.  Exploring the potential of public proteomics data , 2015, Proteomics.

[33]  Lennart Martens,et al.  A Golden Age for Working with Public Proteomics Data , 2017, Trends in biochemical sciences.

[34]  David Haussler,et al.  The UCSC Genome Browser database: 2017 update , 2016, Nucleic Acids Res..

[35]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[36]  Gerben Menschaert,et al.  An update on sORFs.org: a repository of small ORFs identified by ribosome profiling , 2017, Nucleic Acids Res..

[37]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[38]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[39]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[40]  Andrew R Jones,et al.  ProteoAnnotator – Open source proteogenomics annotation software supporting PSI standards , 2014, Proteomics.

[41]  Ruedi Aebersold,et al.  Mass-spectrometric exploration of proteome structure and function , 2016, Nature.

[42]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.