FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

Background Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

[1]  Martin Frank,et al.  GlycomeDB—a unified database for carbohydrate structures , 2010, Nucleic Acids Res..

[2]  Akira R. Kinjo,et al.  The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium* , 2010, J. Biomed. Semant..

[3]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[4]  The UniProt Consortium,et al.  Update on activities at the Universal Protein Resource (UniProt) in 2013 , 2012, Nucleic Acids Res..

[5]  Pjotr Prins,et al.  BioRuby: bioinformatics software for the Ruby programming language , 2010, Bioinform..

[6]  Xiu Lin,et al.  Facing growth in the European Nucleotide Archive , 2012, Nucleic Acids Res..

[7]  L. Stein,et al.  JBrowse: a next-generation genome browser. , 2009, Genome research.

[8]  Akira R. Kinjo,et al.  The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications , 2011, J. Biomed. Semant..

[9]  Karen Eilbeck,et al.  A standard variation file format for human genome sequences , 2010, Genome Biology.

[10]  K. Bretonnel Cohen,et al.  BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains , 2014, Journal of Biomedical Semantics.

[11]  Toshihisa Takagi,et al.  DDBJ new system and service refactoring , 2012, Nucleic Acids Res..

[12]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[13]  Chris Mungall,et al.  A Chado case study: an ontology-based modular schema for representing genome-associated biological information , 2007, ISMB/ECCB.

[14]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[15]  Peter Dawyndt,et al.  An ontology based query engine for querying biological sequences , 2013 .

[16]  Kiyoko F. Aoki-Kinoshita,et al.  UniCarbKB: building a knowledge platform for glycoproteomics , 2013, Nucleic Acids Res..

[17]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[18]  Philip V. Toukach,et al.  Bacterial Carbohydrate Structure Database 3: Principles and Realization , 2011, J. Chem. Inf. Model..

[19]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[20]  Alexander V. Alekseyenko,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btl647 Data and text mining Nested Containment List (NCList): a new algorithm , 2022 .

[21]  Andreas Prlic,et al.  BioJava: an open-source framework for bioinformatics in 2012 , 2012, Bioinform..

[22]  Karen Eilbeck,et al.  GFVO: the Genomic Feature and Variation Ontology , 2015, PeerJ.

[23]  F. Sanger,et al.  The terminal peptides of insulin. , 1949, The Biochemical journal.

[24]  Kiyoko F Aoki-Kinoshita,et al.  The RINGS resource for glycome informatics analysis and data mining on the Web. , 2010, Omics : a journal of integrative biology.

[25]  Fumikazu Konishi,et al.  The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies , 2013, J. Biomed. Semant..

[26]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[27]  Martin Frank,et al.  GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research. , 2006, Glycobiology.