Understanding the Systems Biology of Pathogen Virulence Using Semantic Methodologies

Systems biology approaches to the integrative study of cells, organs and organisms offer the best means of understanding in a holistic manner the diversity of molecular assays that can be now be implemented in a high throughput manner. Such assays can sample the genome, epigenome, proteome, metabolome and microbiome contemporaneously, allowing us for the first time to perform a complete analysis of physiological activity. The central problem remains empowering the scientific community to actually implement such an integration, across seemingly diverse data types and measurements. One promising solution is to apply semantic techniques on a self-consistent and implicitly correct ontological representation of these data types. In this paper we describe how we have applied one such solution, based around the InterMine data warehouse platform which uses as its basis the Sequence Ontology, to facilitate a systems biology analysis of virulence in the apicomplexan pathogen Toxoplasma gondii, a common parasite that infects up to half the worlds population, with acute pathogenic risks for immuno-compromised individuals or pregnant mothers. Our solution, which we named 'toxoMine', has provided both a platform for our collaborators to perform such integrative analyses and also opportunities for such cyberinfrastructure to be further developed, particularly to take advantage of possible semantic similarities of value to knowledge discovery in the Omics enterprise. We discuss these opportunities in the context of further enhancing the capabilities of this powerful integrative platform.

[1]  Ronald C. Taylor,et al.  Distinct Strains of Toxoplasma gondii Feature Divergent Transcriptomes Regardless of Developmental Stage , 2014, PloS one.

[2]  Gos Micklem,et al.  toxoMine: an integrated omics data warehouse for Toxoplasma gondii systems biology research , 2015, Database J. Biol. Databases Curation.

[3]  Sergio Contrino,et al.  InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data , 2012, Bioinform..

[4]  Susanna-Assunta Sansone,et al.  linkedISA: semantic representation of ISA-Tab experimental metadata , 2014, BMC Bioinformatics.

[5]  Gos Micklem,et al.  metabolicMine: an integrated genomics, genetics and proteomics data warehouse for common metabolic disease research , 2013, Database J. Biol. Databases Curation.

[6]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[7]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  Julie M. Sullivan,et al.  FlyMine: an integrated database for Drosophila and Anopheles genomics , 2007, Genome Biology.

[10]  Joshua Phillips,et al.  The caCORE Software Development Kit: Streamlining construction of interoperable biomedical information services , 2006, BMC Medical Informatics Decis. Mak..

[11]  Lokesh P. Tripathi,et al.  TargetMine, an Integrated Data Warehouse for Candidate Gene Prioritisation and Target Discovery , 2011, PloS one.

[12]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[13]  Chris Brew,et al.  TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets , 2015, International Semantic Web Conference.

[14]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[15]  Haiming Wang,et al.  ToxoDB: an integrated Toxoplasma gondii database resource , 2007, Nucleic Acids Res..

[16]  J. Boothroyd,et al.  Toxoplasma gondii Asexual Development: Identification of Developmentally Regulated Genes and Distinct Patterns of Gene Expression , 2002, Eukaryotic Cell.

[17]  Anthony C. Smith,et al.  MitoMiner, an Integrated Database for the Storage and Analysis of Mitochondrial Proteomics Data , 2009, Molecular & Cellular Proteomics.

[18]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[19]  Vladimir B. Bajic,et al.  INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles , 2013, PloS one.

[20]  Gos Micklem,et al.  YeastMine—an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit , 2012, Database J. Biol. Databases Curation.

[21]  Sergio Contrino,et al.  modMine: flexible access to modENCODE data , 2011, Nucleic Acids Res..

[22]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[23]  Gos Micklem,et al.  InterMOD: integrated data and tools for the unification of model organism research , 2013, Scientific Reports.

[24]  J. E. Richardson,et al.  MouseMine: a new data warehouse for MGI , 2015, Mammalian Genome.

[25]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..