Prioritization of Retinal Disease Genes: An Integrative Approach

The discovery of novel disease‐associated variations in genes is often a daunting task in highly heterogeneous disease classes. We seek a generalizable algorithm that integrates multiple publicly available genomic data sources in a machine‐learning model for the prioritization of candidates identified in patients with retinal disease. To approach this problem, we generate a set of feature vectors from publicly available microarray, RNA‐seq, and ChIP‐seq datasets of biological relevance to retinal disease, to observe patterns in gene expression specificity among tissues of the body and the eye, in addition to photoreceptor‐specific signals by the CRX transcription factor. Using these features, we describe a novel algorithm, positive and unlabeled learning for prioritization (PULP). This article compares several popular supervised learning techniques as the regression function for PULP. The results demonstrate a highly significant enrichment for previously characterized disease genes using a logistic regression method. Finally, a comparison of PULP with the popular gene prioritization tool ENDEAVOUR shows superior prioritization of retinal disease genes from previous studies. The java source code, compiled binary, assembled feature vectors, and instructions are available online at https://github.com/ahwagner/PULP.

[1]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[2]  Deborah A Nickerson,et al.  Evaluating Pathogenicity of Rare Variants From Dilated Cardiomyopathy in the Exome Era , 2012, Circulation. Cardiovascular genetics.

[3]  R. Piro,et al.  Computational approaches to disease‐gene prediction: rationale, classification and successes , 2012, The FEBS journal.

[4]  Adam P. DeLuca,et al.  Exome sequencing and analysis of induced pluripotent stem cells identify the cilia-related gene male germ cell-associated kinase (MAK) as a cause of retinitis pigmentosa , 2011, Proceedings of the National Academy of Sciences.

[5]  Thomas Lengauer,et al.  Classification with correlated features: unreliability of feature ranking and solutions , 2011, Bioinform..

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Juan I. Young,et al.  Whole-exome sequencing links a variant in DHDDS to retinitis pigmentosa. , 2011, American journal of human genetics.

[8]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[9]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[10]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[13]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[14]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[15]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[16]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[17]  J. A. Lozano,et al.  Prioritization of candidate cancer genes—an aid to oncogenomic studies , 2008, Nucleic acids research.

[18]  You-Qiang Song,et al.  Prediction of osteoporosis candidate genes by computational disease-gene identification strategy , 2008, Journal of Human Genetics.

[19]  A. Hennig,et al.  Regulation of photoreceptor gene expression by Crx-associated transcription factor network , 2008, Brain Research.

[20]  Ralf Herwig,et al.  Meta-Analysis Approach identifies Candidate Genes and associated Molecular Networks for Type-2 Diabetes Mellitus , 2008, BMC Genomics.

[21]  Oliver Hofmann,et al.  Computational selection and prioritization of candidate genes for Fetal Alcohol Syndrome , 2007, BMC Genomics.

[22]  Jia-Ren Lin,et al.  An application of bioinformatics and text mining to the discovery of novel genes related to bone biology. , 2007, Bone.

[23]  E. Aller,et al.  A novel gene for Usher syndrome type 2: mutations in the long isoform of whirlin are associated with retinitis pigmentosa and sensorineural hearing loss , 2007, Human Genetics.

[24]  T. Meitinger,et al.  Mutations in the CEP290 (NPHP6) gene are a frequent cause of Leber congenital amaurosis. , 2006, American journal of human genetics.

[25]  C. Tabone,et al.  Predicting candidate genes for human deafness disorders: a bioinformatics approach , 2006, BMC Genomics.

[26]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[27]  Thomas L Casavant,et al.  Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[28]  X. Chen,et al.  Localization of the humanRGR opsin gene to chromosome 10q23 , 1996, Human Genetics.

[29]  Edwin M Stone,et al.  Comparative genomics and gene expression analysis identifies BBS9, a new Bardet-Biedl syndrome gene. , 2005, American journal of human genetics.

[30]  R. T. Smith,et al.  A common haplotype in the complement regulatory gene factor H (HF1/CFH) predisposes individuals to age-related macular degeneration. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[31]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[32]  J. Gilbert,et al.  Complement Factor H Variant Increases the Risk of Age-Related Macular Degeneration , 2005, Science.

[33]  D. Weeks,et al.  Susceptibility genes for age-related maculopathy on chromosome 10q26. , 2005, American journal of human genetics.

[34]  Edwin M Stone,et al.  Comparative genomic analysis identifies an ADP-ribosylation factor-like gene as the cause of Bardet-Biedl syndrome (BBS3). , 2004, American journal of human genetics.

[35]  W. Sly,et al.  Apoptosis-inducing signal sequence mutation in carbonic anhydrase IV identified in patients with the RP17 form of retinitis pigmentosa. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[36]  G. Marfany,et al.  Mutation of CERKL, a novel human ceramide kinase gene, causes autosomal recessive retinitis pigmentosa (RP26). , 2004, American journal of human genetics.

[37]  E. Levanon,et al.  Human housekeeping genes are compact. , 2003, Trends in genetics : TIG.

[38]  J. Lupski,et al.  Identification of a novel Bardet-Biedl syndrome protein, BBS7, that shares structural features with BBS1 and BBS2. , 2003, American journal of human genetics.

[39]  Val C. Sheffield,et al.  Identification of the gene (BBS1) most commonly involved in Bardet-Biedl syndrome, a complex human obesity syndrome , 2002, Nature Genetics.

[40]  Alfonso Baldi,et al.  Identification of the gene that, when mutated, causes the human obesity syndrome BBS4 , 2001, Nature Genetics.

[41]  V. Sheffield,et al.  Positional cloning of a novel gene on chromosome 16q causing Bardet-Biedl syndrome (BBS2). , 2001, Human molecular genetics.

[42]  M. Claustres,et al.  Segregation of a mutation in CNGB1 encoding the β-subunit of the rod cGMP-gated channel in a family with autosomal recessive retinitis pigmentosa , 2001, Human Genetics.

[43]  A. Munnich,et al.  The photoreceptor cell-specific nuclear receptor gene (PNR) accounts for retinitis pigmentosa in the Crypto-Jews from Portugal (Marranos), survivors from the Spanish Inquisition , 2000, Human Genetics.

[44]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[45]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[46]  Etsuko N. Moriyama,et al.  Gene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli , 1998, Nucleic Acids Res..

[47]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[48]  J. Ott,et al.  Homozygosity and physical mapping of the autosomal recessive retinitis pigmentosa locus (RP14) on chromosome 6p21.3. , 1998, Genomics.

[49]  T. L. McGee,et al.  Evidence that the penetrance of mutations at the RP11 locus causing dominant retinitis pigmentosa is influenced by a gene linked to the homologous RP11 allele. , 1997, American journal of human genetics.

[50]  Birgit Lorenz,et al.  Mutations in RPE65 cause autosomal recessive childhood–onset severe retinal dystrophy , 1997, Nature Genetics.

[51]  M. Bayés,et al.  A new locus for autosomal recessive retinitis pigmentosa (RP19) maps to 1p13-1p21. , 1997, Genomics.

[52]  S. Daiger,et al.  Genetic mapping of RP1 on 8q11-q21 in an Australian family with autosomal dominant retinitis pigmentosa reduces the critical region to 4 cM between D8S601 and D8S285 , 1996, Human Genetics.

[53]  S. Bhattacharya,et al.  Mapping the RP2 locus for X-linked retinitis pigmentosa on proximal Xp: a genetically defined 5-cM critical region and exclusion of candidate genes by physical mapping. , 1996, Genome research.

[54]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[55]  J. Weissenbach,et al.  A YAC contig spanning the dominant retinitis pigmentosa locus (RP9) on chromosome 7p. , 1995, Genomics.

[56]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[57]  J. Ott,et al.  Localizing multiple X chromosome-linked retinitis pigmentosa loci using multilocus homogeneity tests. , 1990, Proceedings of the National Academy of Sciences of the United States of America.