Detecting epistasis via Markov bases

Rapid research progress in genotyping techniques have allowed large genome-wide association studies. Existing methods often focus on determining associations between single loci and a specific phenotype. However, a particular phenotype is usually the result of complex relationships between multiple loci and the environment. In this paper, we describe a two-stage method for detecting epistasis by combining the traditionally used single-locus search with a search for multiway interactions. Our method is based on an extended version of Fisher's exact test. To perform this test, a Markov chain is constructed on the space of multidimensional contingency tables using the elements of a Markov basis as moves. We test our method on simulated data and compare it to a two-stage logistic regression method and to a fully Bayesian method, showing that we are able to detect the interacting loci when other methods fail to do so. Finally, we apply our method to a genome-wide data set consisting of 685 dogs and identify epistasis associated with canine hair length for four pairs of SNPs.

[1]  Catherine André,et al.  Coat Variation in the Domestic Dog Is Governed by Variants in Three Genes , 2009, Science.

[2]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[3]  E. Ostrander,et al.  Single-Nucleotide-Polymorphism-Based Association Mapping of Dog Stereotypes , 2008, Genetics.

[4]  E. Kirkness,et al.  Extensive and breed-specific linkage disequilibrium in Canis familiaris. , 2004, Genome research.

[5]  M. Purugganan,et al.  The Extent of Linkage Disequilibrium in Rice (Oryza sativa L.) , 2007, Genetics.

[6]  J. M. Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[7]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[8]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[9]  Seth Sullivant,et al.  Lectures on Algebraic Statistics , 2008 .

[10]  A. Blaukat,et al.  Protein tyrosine kinase-mediated pathways in G protein-coupled receptor signaling , 2007, Cell Biochemistry and Biophysics.

[11]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[12]  M. Goddard,et al.  Mapping genes for complex traits in domestic animals and their use in breeding programmes , 2009, Nature Reviews Genetics.

[13]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[14]  S. Fienberg An Iterative Procedure for Estimation in Contingency Tables , 1970 .

[15]  T. Schlake,et al.  Igf-I signalling controls the hair growth cycle and the differentiation of hair shafts. , 2005, The Journal of investigative dermatology.

[16]  K. Lindblad-Toh,et al.  Efficient mapping of mendelian traits in dogs through genome-wide association , 2007, Nature Genetics.

[17]  G. Burnstock,et al.  Purinergic receptors are part of a signalling system for proliferation and differentiation in distinct cell lineages in human anagen hair follicles , 2008, Purinergic Signalling.

[18]  E. Ostrander,et al.  Lessons learned from the dog genome. , 2007, Trends in genetics : TIG.

[19]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[20]  M. Ronis,et al.  Agouti signaling protein stimulates cell division in "viable yellow" (A(vy)/a) mouse liver. , 2007, Experimental biology and medicine.

[21]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[22]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[23]  Debbie S. Yuster,et al.  A complete classification of epistatic two-locus models , 2006, BMC Genetics.

[24]  T. Hansen,et al.  A Bayesian Multilocus Association Method: Allowing for Higher-Order Interaction in Association Studies , 2007, Genetics.

[25]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[26]  Fred A. Wright,et al.  Genetics and population analysis Simulating association studies : a data-based resampling method for candidate regions or whole genome scans , 2007 .

[27]  M. Ronis,et al.  A BRIEF COMMUNICATION , 2007 .

[28]  Judy H Cho,et al.  Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease , 2008, Nature Genetics.

[29]  James A. Cuff,et al.  Genome sequence, comparative analysis and haplotype structure of the domestic dog , 2005, Nature.

[30]  J. Hein,et al.  Using biological networks to search for interacting loci in genome-wide association studies , 2009, European Journal of Human Genetics.

[31]  P. Diaconis,et al.  Algebraic algorithms for sampling from conditional distributions , 1998 .