Using association rule mining to determine promising secondary phenotyping hypotheses

Motivation: Large-scale phenotyping projects such as the Sanger Mouse Genetics project are ongoing efforts to help identify the influences of genes and their modification on phenotypes. Gene–phenotype relations are crucial to the improvement of our understanding of human heritable diseases as well as the development of drugs. However, given that there are ∼20 000 genes in higher vertebrate genomes and the experimental verification of gene–phenotype relations requires a lot of resources, methods are needed that determine good candidates for testing. Results: In this study, we applied an association rule mining approach to the identification of promising secondary phenotype candidates. The predictions rely on a large gene–phenotype annotation set that is used to find occurrence patterns of phenotypes. Applying an association rule mining approach, we could identify 1967 secondary phenotype hypotheses that cover 244 genes and 136 phenotypes. Using two automated and one manual evaluation strategies, we demonstrate that the secondary phenotype candidates possess biological relevance to the genes they are predicted for. From the results we conclude that the predicted secondary phenotypes constitute good candidates to be experimentally tested and confirmed. Availability: The secondary phenotype candidates can be browsed through at http://www.sanger.ac.uk/resources/databases/phenodigm/gene/secondaryphenotype/list. Contact: ao5@sanger.ac.uk or ds5@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Steve D. M. Brown,et al.  The International Mouse Phenotyping Consortium: past and future perspectives on mouse phenotyping , 2012, Mammalian Genome.

[2]  Judith A. Blake,et al.  The Mouse Genome Database: Genotypes, Phenotypes, and Models of Human Disease , 2012, Nucleic Acids Res..

[3]  Kimberly Van Auken,et al.  WormBase 2012: more genomes, more data, new website , 2011, Nucleic Acids Res..

[4]  Robert Hoehndorf,et al.  Mouse genetic and phenotypic resources for human genetics , 2012, Human mutation.

[5]  M. Justice Removing the cloak of invisibility: phenotyping the mouse , 2008, Disease Models & Mechanisms.

[6]  R. Drysdale FlyBase : a database for the Drosophila research community. , 2008, Methods in molecular biology.

[7]  Monte Westerfield,et al.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation , 2009, PLoS biology.

[8]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[9]  R. Ankeny,et al.  Re-thinking organisms: The impact of databases on model organism biology. , 2012, Studies in history and philosophy of biological and biomedical sciences.

[10]  Jacqueline K. White,et al.  The fallacy of ratio correction to address confounding factors , 2012, Laboratory animals.

[11]  Ségolène Aymé,et al.  [Orphanet, an information site on rare diseases]. , 2003, Soins; la revue de reference infirmiere.

[12]  Natasha A. Karp,et al.  Robust and Sensitive Analysis of Mouse Knockout Phenotypes , 2012, PloS one.

[13]  Carol A. Bocchini,et al.  A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) , 2011, Human mutation.

[14]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[15]  Steve D. M. Brown,et al.  The mouse ascending: perspectives for human-disease models , 2007, Nature Cell Biology.

[16]  Paul N. Schofield,et al.  PhenomeNET: a whole-phenome approach to disease gene discovery , 2011, Nucleic acids research.

[17]  William Valdar,et al.  Genetic and Environmental Effects on Complex Traits in Mice , 2006, Genetics.

[18]  Martin Oti,et al.  The biological coherence of human phenome databases. , 2009, American journal of human genetics.

[19]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[20]  Frederick P. Roth,et al.  Predicting phenotype from patterns of annotation , 2003, ISMB.

[21]  Steve D. M. Brown,et al.  Accessing data from the International Mouse Phenotyping Consortium: state of the art and future plans , 2012, Mammalian Genome.

[22]  Damian Smedley,et al.  PhenoDigm: analyzing curated annotations to associate animal models with human diseases , 2013, Database J. Biol. Databases Curation.

[23]  Georgi Georgiev,et al.  PhenomicDB: a new cross-species genotype/phenotype resource , 2006, Nucleic Acids Res..

[24]  Susan M. Bridges,et al.  Cross-Ontology Multi-level Association Rule Mining in the Gene Ontology , 2012, PloS one.

[25]  Kriston L. McGary,et al.  Systematic discovery of nonobvious human disease models through orthologous phenotypes , 2010, Proceedings of the National Academy of Sciences.

[26]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[27]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[28]  Barry Smith,et al.  Dependence Relationships between Gene Ontology Terms based on TIGR Gene Product Annotations , 2004 .

[29]  Janan T Eppig,et al.  The mammalian phenotype ontology: enabling robust annotation and comparative analysis , 2009, Wiley interdisciplinary reviews. Systems biology and medicine.

[30]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[31]  John M. Hancock,et al.  Practical application of ontologies to annotate and analyse large scale raw mouse phenotype data , 2009, BMC Bioinformatics.

[32]  Damian Smedley,et al.  Genome-wide Generation and Systematic Phenotyping of Knockout Mice Reveals New Roles for Many Genes , 2013, Cell.

[33]  Damian Smedley,et al.  Improved exome prioritization of disease genes through cross-species phenotype comparison , 2014, Genome research.