Gene-Centric Genomewide Association Study via Entropy

Genes are the functional units in most organisms. Compared to genetic variants located outside genes, genic variants are more likely to affect disease risk. The development of the human HapMap project provides an unprecedented opportunity for genetic association studies at the genomewide level for elucidating disease etiology. Currently, most association studies at the single-nucleotide polymorphism (SNP) or the haplotype level rely on the linkage information between SNP markers and disease variants, with which association findings are difficult to replicate. Moreover, variants in genes might not be sufficiently covered by currently available methods. In this article, we present a gene-centric approach via entropy statistics for a genomewide association study to identify disease genes. The new entropy-based approach considers genic variants within one gene simultaneously and is developed on the basis of a joint genotype distribution among genetic variants for an association test. A grouping algorithm based on a penalized entropy measure is proposed to reduce the dimension of the test statistic. Type I error rates and power of the entropy test are evaluated through extensive simulation studies. The results indicate that the entropy test has stable power under different disease models with a reasonable sample size. Compared to single SNP-based analysis, the gene-centric approach has greater power, especially when there is more than one disease variant in a gene. As the genomewide genic SNPs become available, our entropy-based gene-centric approach would provide a robust and computationally efficient way for gene-based genomewide association study.

[1]  Jianqing Fan Test of Significance Based on Wavelet Thresholding and Neyman's Truncation , 1996 .

[2]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[3]  M. Boehnke,et al.  Limits of resolution of genetic linkage studies: implications for the positional cloning of human disease genes. , 1994, American journal of human genetics.

[4]  N E Morton,et al.  Tests and estimates of allelic association in complex inheritance. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Andrew P Morris,et al.  Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. , 2004, American journal of human genetics.

[6]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[7]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[8]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[9]  Lars G Fritsche,et al.  Hypothetical LOC387715 is a second major susceptibility gene for age-related macular degeneration, contributing independently of complement factor H to disease risk. , 2005, Human molecular genetics.

[10]  G. Abecasis,et al.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies , 2006, Nature Genetics.

[11]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[12]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[13]  D. Weeks,et al.  Candidate gene analysis suggests a role for fatty acid biosynthesis and regulation of the complement system in the etiology of age-related maculopathy. , 2005, Human molecular genetics.

[14]  S. Yagel,et al.  Vascular endothelial growth factor, epidermal growth factor and fibroblast growth factor-4 and -10 stimulate trophoblast plasminogen activator system and metalloproteinase-9. , 2004, Molecular human reproduction.

[15]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[16]  E. Sarandöl,et al.  Oxidizability of apolipoprotein B-containing lipoproteins and serum paraoxonase/arylesterase activities in preeclampsia. , 2004, Clinical biochemistry.

[17]  Michael Krawczak,et al.  Entropy-based SNP selection for genetic association studies , 2003, Human Genetics.

[18]  W. Willett,et al.  A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer , 2007, Nature Genetics.

[19]  Hiroshi Sato,et al.  Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction , 2002, Nature Genetics.

[20]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[21]  G. Gibson,et al.  Epistasis and pleiotropy as natural properties of transcriptional regulation. , 1996, Theoretical population biology.

[22]  Patrick F Sullivan,et al.  False discoveries and models for gene discovery. , 2003, Trends in genetics : TIG.

[23]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[24]  Jung-Ying Tzeng,et al.  Evolutionary‐based grouping of haplotypes in association analysis , 2005, Genetic epidemiology.

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  Paul Fearnhead,et al.  Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft , 2007 .

[27]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[28]  E. Szathmáry,et al.  Can Genes Explain Biological Complexity? , 2001, Science.

[29]  G. Vovis,et al.  Candidate-Gene Association Study of Mothers with Pre-Eclampsia, and Their Infants, Analyzing 775 SNPs in 190 Genes , 2006, Human Heredity.

[30]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[31]  Lee Hartwell,et al.  Robust Interactions , 2004, Science.

[32]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[33]  Momiao Xiong,et al.  An entropy-based statistic for genomewide association studies. , 2005, American journal of human genetics.

[34]  Scott M. Williams,et al.  Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[35]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[36]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[37]  P. Fearnhead,et al.  Genome-wide association study of prostate cancer identifies a second risk locus at 8q24 , 2007, Nature Genetics.

[38]  R. Rochat,et al.  Causes of Maternal Mortality in the United States , 1985, Obstetrics and gynecology.

[39]  P. Sham,et al.  The future of association studies: gene-based analysis and replication. , 2004, American journal of human genetics.

[40]  Mariza de Andrade,et al.  High-resolution whole-genome association study of Parkinson disease. , 2005, American journal of human genetics.

[41]  C Charles Gu,et al.  Genetic association mapping under founder heterogeneity via weighted haplotype similarity analysis in candidate genes , 2004, Genetic epidemiology.

[42]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[43]  M. Varner,et al.  Paternal and maternal components of the predisposition to preeclampsia. , 2001, The New England journal of medicine.

[44]  J. Ioannidis,et al.  Association of C677T polymorphism in the methylenetetrahydrofolate reductase gene with hypertension in pregnancy and pre-eclampsia: a meta-analysis , 2004, Journal of hypertension.