From XML to RDF: how semantic web technologies will change the design of 'omic' standards

With the ongoing rapid increase in both volume and diversity of 'omic' data (genomics, transcriptomics, proteomics, and others), the development and adoption of data standards is of paramount importance to realize the promise of systems biology. A recent trend in data standard development has been to use extensible markup language (XML) as the preferred mechanism to define data representations. But as illustrated here with a few examples from proteomics data, the syntactic and document-centric XML cannot achieve the level of interoperability required by the highly dynamic and integrated bioinformatics applications. In the present article, we discuss why semantic web technologies, as recommended by the World Wide Web consortium (W3C), expand current data standard technology for biological data representation and management.

[1]  Hironori Mizuguchi,et al.  HUP-ML: Human Proteome Markup Language for Proteomics Database , 2003 .

[2]  Joseph Farrell,et al.  Standardization, Compatibility, and Innovation , 1985 .

[3]  Jonas S. Almeida,et al.  An XML standard for the dissemination of annotated 2D gel electrophoresis data complemented with mass spectrometry results , 2004, BMC Bioinformatics.

[4]  John Quackenbush,et al.  Data standards for 'omic' science , 2004, Nature Biotechnology.

[5]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[6]  Tim Berners-Lee,et al.  Publishing on the semantic web , 2001, Nature.

[7]  Peter Szolovits,et al.  What Is a Knowledge Representation? , 1993, AI Mag..

[8]  Alvis Brazma,et al.  On the Importance of Standardisation in Life Sciences , 2001, Bioinform..

[9]  E. Zerhouni,et al.  Medicine. The NIH Roadmap. , 2003, Science.

[10]  J. Spender Pluralist Epistemology and the Knowledge-Based Theory of the Firm , 1998 .

[11]  Christian J. Stoeckert,et al.  Minimum information about a functional genomics experiment: the state of microarray standards and their extension to other technologies , 2004 .

[12]  Nicola Guarino,et al.  Formal Ontology and Information Systems , 1998 .

[13]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[14]  Ricard Solé,et al.  Language: Syntax for free? , 2005, Nature.

[15]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[16]  Erika Check,et al.  NIH 'roadmap' charts course to tackle big research issues , 2003, Nature.

[17]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[18]  David Antoniucci,et al.  The Technical Perspective , 2004 .

[19]  M. H. Sherif A framework for standardization in telecommunications and information technology , 2001 .

[20]  P. Gordon Numerical Cognition Without Words: Evidence from Amazonia , 2004, Science.

[21]  Jason E. Stewart,et al.  Design and implementation of microarray gene expression markup language (MAGE-ML) , 2002, Genome Biology.

[22]  Frank van Harmelen,et al.  Web Ontology Language , 2004 .

[23]  Sue Newell,et al.  Back to the Future: From Knowledge Management to Data Management , 2001, ECIS.

[24]  E Barillot,et al.  XML: a lingua franca for science? , 2000, Trends in biotechnology.

[25]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[26]  Matthew S. Grob,et al.  CDMA/HDR: a bandwidth-efficient high-speed wireless data service for nomadic users , 2000, IEEE Commun. Mag..

[27]  E. Zerhouni The NIH Roadmap , 2003, Science.

[28]  C. Cargill Information technology standardization: Theory, process, and organizations , 1989 .

[29]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .