CANDID: a flexible method for prioritizing candidate genes for complex human traits

Genomewide studies and localized candidate gene approaches have become everyday study designs for identifying polymorphisms in genes that influence complex human traits. Yet, in general, the number of significant findings and the need to focus on smaller regions require a prioritization of genes for further study. Some candidate gene identification algorithms have been proposed in recent years to attempt to streamline this prioritization, but many suffer from limitations imposed by the source data or are difficult to use and understand. CANDID is a prioritization algorithm designed to produce impartial, accurate rankings of candidate genes that influence complex human traits. CANDID can use information from publications, protein domain descriptions, cross‐species conservation measures, gene expression profiles and protein‐protein interactions in its analysis. Additionally, users may supplement these data sources with results from linkage, association and other studies. CANDID was tested on well‐known complex trait genes using data from the Online Mendelian Inheritance in Man database. Additionally, CANDID was evaluated in a modeled gene discovery environment, where it ranked genes whose trait associations were published after CANDID's databases were compiled. In all settings, CANDID exhibited high sensitivity and specificity, indicating an improvement upon previously published algorithms. Its accuracy and ease of use make CANDID a highly useful tool in study design and analysis for complex human traits. Genet. Epidemiol. 2008. © 2008 Wiley‐Liss, Inc.

[1]  Changqing Zeng,et al.  A six-nucleotide insertion-deletion polymorphism in the CASP8 promoter is associated with susceptibility to multiple cancers , 2007, Nature Genetics.

[2]  J. Whittaker,et al.  Evidence for unique association signals in SLE at the CD28-CTLA4-ICOS locus in a family-based study. , 2006, Human molecular genetics.

[3]  Frances S. Turner,et al.  POCUS: mining genomic sequence annotation to predict disease genes , 2003, Genome Biology.

[4]  G. Galbraith,et al.  TRAF1-C5 as a Risk Locus for Rheumatoid Arthritis—A Genomewide Study , 2008 .

[5]  P. Kemmeren,et al.  A new web-based data mining tool for the identification of candidate genes for human genetic disorders , 2003, European Journal of Human Genetics.

[6]  Eric J Topol,et al.  An LRP8 variant is associated with familial and premature coronary artery disease and myocardial infarction. , 2007, American journal of human genetics.

[7]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[8]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.

[9]  Judy H. Cho,et al.  A Genome-Wide Association Study Identifies IL23R as an Inflammatory Bowel Disease Gene , 2006, Science.

[10]  J. Gécz,et al.  Mutations in the gene encoding the Sigma 2 subunit of the adaptor protein 1 complex, AP1S2, cause X-linked mental retardation. , 2006, American journal of human genetics.

[11]  P. Bork,et al.  G2D: a tool for mining genes associated with disease , 2005, BMC Genetics.

[12]  Eric Boerwinkle,et al.  Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL , 2007, Nature Genetics.

[13]  J. Kelsoe,et al.  Identifying a series of candidate genes for mania and psychosis: a convergent functional genomics approach. , 2000, Physiological genomics.

[14]  Li Ni,et al.  A procedure for assessing GO annotation consistency , 2005, ISMB.

[15]  Luca Benini,et al.  TOM: a web-based integrated approach for identification of candidate disease genes , 2006, Nucleic Acids Res..

[16]  C. Ouzounis,et al.  Genome-wide identification of genes likely to be involved in human genetic disease. , 2004, Nucleic acids research.

[17]  Judy H Cho,et al.  Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis , 2007, Nature Genetics.

[18]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[19]  F. Holsboer,et al.  P2RX7, a gene coding for a purinergic ligand-gated ion channel, is associated with major depressive disorder. , 2006, Human molecular genetics.

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  Yusuke Nakamura,et al.  A nonsynonymous SNP in PRKCH (protein kinase C η) increases the risk of cerebral infarction , 2007, Nature Genetics.

[22]  Chun Li,et al.  Evaluation of coverage variation of SNP chips for genome-wide association studies , 2008, European Journal of Human Genetics.

[23]  Toshihiro Tanaka,et al.  A functional polymorphism in COL11A1, which encodes the alpha 1 chain of type XI collagen, is associated with susceptibility to lumbar disc herniation. , 2007, American journal of human genetics.

[24]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[25]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[26]  Li Wang,et al.  CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data , 2007, Bioinform..

[27]  D. Cox,et al.  A genomewide association study of skin pigmentation in a South Asian population. , 2007, American journal of human genetics.

[28]  V. Sheffield,et al.  CHD7 gene polymorphisms are associated with susceptibility to idiopathic scoliosis. , 2007, American journal of human genetics.

[29]  Yusuke Nakamura,et al.  A functional polymorphism in the 5′ UTR of GDF5 is associated with susceptibility to osteoarthritis , 2007, Nature Genetics.

[30]  J. Hebebrand,et al.  Evidence for involvement of the vitamin D receptor gene in idiopathic short stature via a genome-wide linkage study and subsequent association studies. , 2006, Human molecular genetics.

[31]  L. Almasy,et al.  Multipoint quantitative-trait linkage analysis in general pedigrees. , 1998, American journal of human genetics.

[32]  Jason J. Corneveaux,et al.  Common Kibra Alleles Are Associated with Human Memory Performance , 2006, Science.

[33]  Jason Y. Liu,et al.  Analysis of protein sequence and interaction data for candidate disease gene prediction , 2006, Nucleic acids research.

[34]  Kenneth H. Buetow,et al.  Gene functional similarity search tool (GFSST) , 2006, BMC Bioinformatics.

[35]  M. McCarthy,et al.  Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes , 2007, Science.

[36]  L Kruglyak,et al.  Parametric and nonparametric linkage analysis: a unified multipoint approach. , 1996, American journal of human genetics.

[37]  Jaana M. Hartikainen,et al.  A common coding variant in CASP8 is associated with breast cancer risk , 2007, Nature Genetics.

[38]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[39]  S. Leung,et al.  Heritable germline epimutation of MSH2 in a family with hereditary nonpolyposis colorectal cancer , 2006, Nature Genetics.

[40]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[41]  Lester L. Peters,et al.  Genome-wide association study identifies novel breast cancer susceptibility loci , 2007, Nature.

[42]  Katri Pylkäs,et al.  A recurrent mutation in PALB2 in Finnish cancer families , 2007, Nature.

[43]  K. Lunetta,et al.  The neuronal sortilin-related receptor SORL1 is genetically associated with Alzheimer disease , 2007, Nature Genetics.

[44]  Andrew J Lees,et al.  Microdeletion encompassing MAPT at chromosome 17q21.3 is associated with developmental delay and learning disability , 2006, Nature Genetics.

[45]  Christopher B. Miller,et al.  Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia , 2007, Nature.

[46]  Gonçalo R Abecasis,et al.  Sequence features in regions of weak and strong linkage disequilibrium. , 2005, Genome research.

[47]  Scott F. Saccone,et al.  Novel genes identified in a high-density genome wide association study for nicotine dependence. , 2007, Human molecular genetics.

[48]  K. Sleegers,et al.  A genomewide screen for late-onset Alzheimer disease in a genetically isolated Dutch population. , 2007, American journal of human genetics.

[49]  M. Krawczak,et al.  A common haplotype of the annexin A5 (ANXA5) gene promoter is associated with recurrent pregnancy loss. , 2007, Human molecular genetics.

[50]  B. Snel,et al.  Predicting disease genes using protein–protein interactions , 2006, Journal of Medical Genetics.

[51]  Nazneen Rahman,et al.  Truncating mutations in the Fanconi anemia J gene BRIP1 are low-penetrance breast cancer susceptibility alleles , 2006, Nature Genetics.

[52]  Steven J. Schrodi,et al.  A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. , 2007, American journal of human genetics.

[53]  P. Sullivan,et al.  Haplotypes spanning SPEC2, PDZ-GEF2 and ACSL6 genes are associated with schizophrenia. , 2006, Human molecular genetics.

[54]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Chi Pui Pang,et al.  HTRA1 promoter polymorphism in wet age-related macular degeneration. , 2007, Science.

[56]  Eric S. Lander,et al.  Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[57]  T. Hudson,et al.  Genetic analysis of 103 candidate genes for coronary artery disease and associated phenotypes in a founder population reveals a new association between endothelin-1 and high-density lipoprotein cholesterol. , 2007, American journal of human genetics.

[58]  Yusuke Nakamura,et al.  Functional SNP in an Sp1-binding site of AGTRL1 gene is associated with susceptibility to brain infarction. , 2007, Human molecular genetics.

[59]  C. Crombie,et al.  A genetic association study of chromosome 11q22-24 in two different samples implicates the FXYD6 gene, encoding phosphohippolin, in susceptibility to schizophrenia. , 2007, American journal of human genetics.

[60]  R. Martins,et al.  A functional polymorphism within plasminogen activator urokinase (PLAU) is associated with Alzheimer's disease. , 2006, Human molecular genetics.

[61]  Wei Chen,et al.  A three-single-nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye-color variation. , 2007, American journal of human genetics.

[62]  Klaus Dieterich,et al.  Homozygous mutation of AURKC yields large-headed polyploid spermatozoa and causes male infertility , 2007, Nature Genetics.

[63]  Sarah Calvo,et al.  Systematic identification of human mitochondrial disease genes through integrative genomics , 2006, Nature Genetics.

[64]  Oliver Sieber,et al.  A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk , 2007, Nature Genetics.

[65]  Kari Stefansson,et al.  A genetic risk factor for periodic limb movements in sleep. , 2007, The New England journal of medicine.

[66]  J. Bell,et al.  Variation in MICA and MICB genes and enhanced susceptibility to paucibacillary leprosy in South India. , 2006, Human molecular genetics.

[67]  Lisa M. Schwartz,et al.  A genetic risk factor for periodic limb movements in sleep. , 2008, The New England journal of medicine.

[68]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[69]  Snæbjörn Pálsson,et al.  Genetic determinants of hair, eye and skin pigmentation in Europeans , 2007, Nature Genetics.

[70]  Thomas Bourgeron,et al.  Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders , 2007, Nature Genetics.

[71]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.