Controlled vocabularies and ontologies in proteomics: Overview, principles and practice

This paper focuses on the use of controlled vocabularies (CVs) and ontologies especially in the area of proteomics, primarily related to the work of the Proteomics Standards Initiative (PSI). It describes the relevant proteomics standard formats and the ontologies used within them. Software and tools for working with these ontology files are also discussed. The article also examines the “mapping files” used to ensure correct controlled vocabulary terms that are placed within PSI standards and the fulfillment of the MIAPE (Minimum Information about a Proteomics Experiment) requirements. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.

[1]  Lennart Martens,et al.  jTraML: An Open Source Java API for TraML, the PSI Standard for Sharing SRM Transitions , 2011, Journal of proteome research.

[2]  Lennart Martens,et al.  TraML—A Standard Format for Exchange of Selected Reaction Monitoring Transition Lists* , 2011, Molecular & Cellular Proteomics.

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  J Alberto Medina-Aunon,et al.  The ProteoRed MIAPE web toolkit: A User-friendly Framework to Connect and Share Proteomics Standards* , 2011, Molecular & Cellular Proteomics.

[5]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[6]  Eric W. Deutsch,et al.  File Formats Commonly Used in Mass Spectrometry Proteomics* , 2012, Molecular & Cellular Proteomics.

[7]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[8]  Daniel P. Miranker,et al.  Mapping between the OBO and OWL ontology languages , 2011, J. Biomed. Semant..

[9]  Antje Chang,et al.  The BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources , 2010, Nucleic Acids Res..

[10]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[11]  Paul N. Schofield,et al.  The Units Ontology: a tool for integrating units of measurement in science , 2012, Database J. Biol. Databases Curation.

[12]  Peter Woollard,et al.  A Community Standard Format for the Representation of Protein Affinity Reagents* , 2009, Molecular & Cellular Proteomics.

[13]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[14]  Sandra E. Orchard,et al.  Molecular Interactions and Data Standardisation , 2010, Proteome Bioinformatics.

[15]  Peer Bork,et al.  Ontologies in Quantitative Biology: A Basis for Comparison, Integration, and Discovery , 2010, PLoS biology.

[16]  Jignesh R. Parikh,et al.  mzAPI: a new strategy for efficiently sharing mass spectrometry data , 2009, Nature Methods.

[17]  Lennart Martens,et al.  jmzML, an open‐source Java API for mzML, the PSI standard for MS data , 2010, Proteomics.

[18]  A. J. Clifford,et al.  BIOCHIMICA ET BIOPHYSICA ACTA , 2022 .

[19]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[20]  Ian Horrocks Tool Support for Ontology Engineering , 2011, Foundations for the Web of Information and Services.

[21]  Sebastian Rudolph,et al.  Foundations of Semantic Web Technologies , 2009 .

[22]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[23]  Eric W. Deutsch,et al.  Mass Spectrometer Output File Format mzML , 2010, Proteome Bioinformatics.

[24]  Chris F. Taylor,et al.  Guidelines for reporting the use of mass spectrometry informatics in proteomics , 2008, Nature Biotechnology.

[25]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[26]  Lennart Martens,et al.  The Proteomics Identifications database: 2010 update , 2009, Nucleic Acids Res..

[27]  Martin Eisenacher,et al.  The mzQuantML Data Standard for Mass Spectrometry–based Quantitative Studies in Proteomics , 2013, Molecular & Cellular Proteomics.

[28]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[29]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[30]  Hamid Mirzaei,et al.  Guidelines for reporting the use of column chromatography in proteomics , 2010, Nature Biotechnology.

[31]  Ian Horrocks,et al.  A Description Logic Primer , 2012, ArXiv.

[32]  Gary D Bader,et al.  BMC Biology BioMed Central , 2007 .

[33]  Martin Eisenacher,et al.  mzIdentML: an open community-built standard format for the results of proteomics spectrum identification algorithms. , 2011, Methods in molecular biology.

[34]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[35]  Peter N. Robinson,et al.  Introduction to Bio-Ontologies , 2011 .

[36]  Silke Eckstein Informationsmanagement in der Systembiologie , 2011 .

[37]  Juan Antonio Vizcaíno,et al.  jmzIdentML API: A Java interface to the mzIdentML standard for peptide and protein identification data , 2012, Proteomics.

[38]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.

[39]  Jian Zhang,et al.  The Protein Ontology: a structured representation of protein forms and complexes , 2010, Nucleic Acids Res..

[40]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[41]  Matthew E Monroe,et al.  An efficient data format for mass spectrometry-based proteomics , 2010, Journal of the American Society for Mass Spectrometry.

[42]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[43]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[44]  Mathias Wilhelm,et al.  mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets* , 2011, Molecular & Cellular Proteomics.

[45]  B. Domon,et al.  Selected reaction monitoring applied to proteomics. , 2011, Journal of mass spectrometry : JMS.

[46]  Midori A. Harris,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm112 Databases and ontologies OBO-Edit—an ontology editor for biologists , 2007 .

[47]  Huajun Chen,et al.  Semantic Web meets Integrative Biology: a survey , 2013, Briefings Bioinform..

[48]  S. Carr,et al.  Reporting Protein Identification Data , 2006, Molecular & Cellular Proteomics.

[49]  Lennart Martens,et al.  The Ontology Lookup Service: bigger and better , 2010, Nucleic Acids Res..

[50]  James A. Hendler,et al.  The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities , 2001 .

[51]  Peter Lampen,et al.  JCAMP-DX for Mass Spectrometry , 1994 .

[52]  Rolf Apweiler,et al.  The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries , 2006, BMC Bioinformatics.

[53]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[54]  Sandra Orchard,et al.  Molecular interaction databases , 2012, Proteomics.

[55]  Eric W Deutsch,et al.  Spectra, chromatograms, Metadata: mzML-the standard data format for mass spectrometer output. , 2011, Methods in molecular biology.

[56]  Lennart Martens,et al.  OLS Dialog: An open-source front end to the Ontology Lookup Service , 2010, BMC Bioinformatics.

[57]  Simon Josefsson The Base16, Base32, and Base64 Data Encodings , 2003, RFC.

[58]  Lennart Martens,et al.  The PSI semantic validator: A framework to check MIAPE compliance of proteomics data , 2009, Proteomics.

[59]  Johannes Griss,et al.  jmzReader: A Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats , 2012, Proteomics.

[60]  Henrik Eriksson,et al.  The evolution of Protégé: an environment for knowledge-based systems development , 2003, Int. J. Hum. Comput. Stud..

[61]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[62]  Jean-Pierre Both,et al.  imzML: Imaging Mass Spectrometry Markup Language: A common data format for mass spectrometry imaging. , 2011, Methods in molecular biology.

[63]  Rolf Apweiler,et al.  The Use of Common Ontologies and Controlled Vocabularies to Enable Data Exchange and Deposition for Complex Proteomic Experiments , 2004, Pacific Symposium on Biocomputing.

[64]  Luisa Montecchi-Palazzi,et al.  The PSI-MOD community standard for representation of protein modification data , 2008, Nature Biotechnology.

[65]  Henning Hermjakob,et al.  The Gel Electrophoresis Markup Language (GelML) from the Proteomics Standards Initiative , 2010, Proteomics.

[66]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[67]  Gerhard Mayer,et al.  Data management in Systems biology II - Outlook towards the semantic web , 2009, ArXiv.

[68]  Yin Chen,et al.  OBO Explorer: an editor for open biomedical ontologies in OWL , 2008, Bioinform..

[69]  Martin Kuiper,et al.  OLSVis: an animated, interactive visual browser for bio-ontologies , 2011, BMC Bioinformatics.

[70]  Andrew Keller,et al.  Software pipeline and data analysis for MS/MS proteomics: the trans-proteomic pipeline. , 2011, Methods in molecular biology.

[71]  Phil Andrews,et al.  Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: the Amsterdam principles. , 2009, Journal of proteome research.

[72]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[73]  Marie-France Robbe,et al.  imzML--a common data format for the flexible exchange and processing of mass spectrometry imaging data. , 2012, Journal of proteomics.

[74]  Diego Calvanese,et al.  The description logic handbook: theory , 2003 .

[75]  Lennart Martens,et al.  The PRoteomics IDEntification (PRIDE) Converter 2 Framework: An Improved Suite of Tools to Facilitate Data Submission to the PRIDE Database and the ProteomeXchange Consortium , 2012, Molecular & Cellular Proteomics.

[76]  Brämer Gr International statistical classification of diseases and related health problems. Tenth revision. , 1988, World health statistics quarterly. Rapport trimestriel de statistiques sanitaires mondiales.

[77]  Johann Joets,et al.  Guidelines for reporting the use of gel image informatics in proteomics , 2010, Nature Biotechnology.

[78]  Michel Dumontier,et al.  Relations as patterns: bridging the gap between OBO and OWL , 2010, BMC Bioinformatics.