An ontology approach to comparative phenomics in plants

BackgroundPlant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework.ResultsWe developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.ConclusionsThe use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.

[1]  A. S. Serebrovsky,et al.  “Somatic segregation” in domestic fowl , 1925, Journal of Genetics.

[2]  Damian Smedley,et al.  PhenoDigm: analyzing curated annotations to associate animal models with human diseases , 2013, Database J. Biol. Databases Curation.

[3]  Peter D. Karp,et al.  MetaCyc and AraCyc. Metabolic Pathway Databases for Plant Research1[w] , 2005, Plant Physiology.

[4]  Pamela S Soltis,et al.  Between Two Fern Genomes , 2014, GigaScience.

[5]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[6]  Laura M. Jackson,et al.  Finding Our Way through Phenotypes , 2015, PLoS biology.

[7]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[8]  Paul N. Schofield,et al.  Computational tools for comparative phenomics: the role and promise of ontologies , 2012, Mammalian Genome.

[9]  David M. Goodstein,et al.  Phytozome: a comparative platform for green plant genomics , 2011, Nucleic Acids Res..

[10]  Paula M. Mabee,et al.  500,000 fish phenotypes: The new informatics landscape for evolutionary and developmental biology of the vertebrate skeleton , 2012, Zeitschrift fur angewandte Ichthyologie = Journal of applied ichthyology.

[11]  Paul N. Schofield,et al.  An integrative, translational approach to understanding rare and orphan genetically based diseases , 2013, Interface Focus.

[12]  Paul Pavlidis,et al.  Gene Ontology term overlap as a measure of gene functional similarity , 2008, BMC Bioinformatics.

[13]  Nori Kurata,et al.  Oryzabase: an integrated information resource for rice science , 2010 .

[14]  Robert M. Buels,et al.  The Sol Genomics Network (solgenomics.net): growing tomatoes using Perl , 2010, Nucleic Acids Res..

[15]  D. Meinke,et al.  A Comprehensive Dataset of Genes with a Loss-of-Function Mutant Phenotype in Arabidopsis , 2012, Plant Physiology.

[16]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[17]  Michel Dumontier,et al.  Towards quantitative measures in applied ontology , 2012, ArXiv.

[18]  J. Lynch,et al.  Integration of root phenes for soil resource acquisition , 2013, Front. Plant Sci..

[19]  Hilmar Lapp,et al.  Evolutionary Characters, Phenotypes and Ontologies: Curating Data from the Systematic Biology Literature , 2010, PloS one.

[20]  Damian Smedley,et al.  Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish , 2012, Disease Models & Mechanisms.

[21]  J. Bennetzen,et al.  The Physcomitrella Genome Reveals Evolutionary Insights into the Conquest of Land by Plants , 2008, Science.

[22]  David Meinke,et al.  Molecular Foundations of Reproductive Lethality in Arabidopsis thaliana , 2011, PloS one.

[23]  Bernard Gibaud,et al.  Towards an Imaging Biomarker Ontology Based on the Open Biological and Biomedical Ontologies Foundry , 2015, SWAT4LS.

[24]  Nick Campbell,et al.  Maize genetics and genomics database , 2003, Nature Reviews Genetics.

[25]  Dietrich Rebholz-Schuhmann,et al.  Brain: biomedical knowledge manipulation , 2013, Bioinform..

[26]  Lloyd W. Sumner,et al.  MedicCyc: a biochemical pathway database for Medicago truncatula , 2007, Bioinform..

[27]  Monte Westerfield,et al.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation , 2009, PLoS biology.

[28]  Eva Huala,et al.  The Arabidopsis Information Resource , 2017 .

[29]  Michel Dumontier,et al.  A common layer of interoperability for biomedical ontologies based on OWL EL , 2011, Bioinform..

[30]  Lisa C. Harper,et al.  MaizeGDB: curation and outreach go hand-in-hand , 2011, Database J. Biol. Databases Curation.

[31]  Rui Jiang,et al.  From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity , 2013, TheScientificWorldJournal.

[32]  Michelle N. Knowlton,et al.  A PATO-compliant zebrafish screening database (MODB): management of morpholino knockdown screen information , 2008, BMC Bioinformatics.

[33]  Ulf Leser,et al.  Mining phenotypes for gene function prediction , 2008, BMC Bioinformatics.

[34]  M. Ashburner,et al.  Finding our way through phenotypes. PLoS Biology 13(1): e1002033. , 2015 .

[35]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[36]  Nigel W. Hardy,et al.  Analyzing gene expression data in mice with the Neuro Behavior Ontology , 2013, Mammalian Genome.

[37]  O. Folkerts,et al.  Expression Profiling of the Maize Flavonoid Pathway Genes Controlled by Estradiol-Inducible Transcription Factors CRC and P , 2000, Plant Cell.

[38]  Cynthia L. Smith,et al.  Integrating phenotype ontologies across multiple species , 2010, Genome Biology.

[39]  Julia Frugoli,et al.  The Medicago truncatula SUNN Gene Encodes a CLV1-like Leucine-rich Repeat Receptor Kinase that Regulates Nodule Number and Root Length , 2005, Plant Molecular Biology.

[40]  Pankaj Jaiswal,et al.  Gramene database: a hub for comparative plant genomics. , 2011, Methods in molecular biology.

[41]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.

[42]  Chris Mungall,et al.  A Chado case study: an ontology-based modular schema for representing genome-associated biological information , 2007, ISMB/ECCB.

[43]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[44]  P. Benfey,et al.  From Genotype to Phenotype: Systems Biology Meets Natural Variation , 2008, Science.

[45]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[46]  R. Murray,et al.  Research at the intersection of the physical and life sciences. , 2010, Analytical chemistry.

[47]  María Martín,et al.  The Gene Ontology: enhancements for 2011 , 2011, Nucleic Acids Res..

[48]  Dietrich Rebholz-Schuhmann,et al.  Interoperability between phenotype and anatomy ontologies , 2010, Bioinform..

[49]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[50]  Barry Smith,et al.  The Plant Ontology as a Tool for Comparative Plant Anatomy and Genomic Analyses , 2012, Plant & cell physiology.

[51]  Paul N. Schofield,et al.  PhenomeNET: a whole-phenome approach to disease gene discovery , 2011, Nucleic acids research.

[52]  Barry Smith,et al.  Ontologies as Integrative Tools for Plant Science Nih Public Access Author Manuscript $watermark-text Ontology 101 $watermark-text , 2022 .

[53]  Michael S. Barker,et al.  The Selaginella Genome Identifies Genetic Changes Associated with the Evolution of Vascular Plants , 2011, Science.

[54]  J. Mol,et al.  The flavonoid biosynthetic pathway in plants: Function and evolution , 1994 .

[55]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[56]  A. Stapleton,et al.  Flavonoids Can Protect Maize DNA from the Induction of Ultraviolet Radiation Damage , 1994, Plant physiology.

[57]  Iris Tzafrir,et al.  A Sequence-Based Map of Arabidopsis Genes with Mutant Phenotypes1,212 , 2003, Plant Physiology.

[58]  Julia Frugoli,et al.  The M. truncatula SUNN gene is expressed in vascular tissue, similarly to RDN1, consistent with the role of these nodulation regulation genes in long distance signaling , 2012, Plant signaling & behavior.

[59]  Damian Smedley,et al.  Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome , 2014, Science Translational Medicine.

[60]  John M. Hancock,et al.  Using ontologies to describe mouse phenotypes , 2004, Genome Biology.

[61]  Nigel W. Hardy,et al.  Mouse model phenotypes provide information about human drug targets , 2013, Bioinform..

[62]  D. Meinke,et al.  A survey of dominant mutations in Arabidopsis thaliana. , 2013, Trends in plant science.