Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions

BackgroundGenome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs) which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF), which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF).ResultsSURF's ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm.ConclusionResearchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be used instead of ReliefF to filter a dataset before an exhaustive MDR analysis. This change increases the ability of a study to detect gene-gene interactions. The SURF algorithm is implemented in the open source Multifactor Dimensionality Reduction (MDR) software package available from http://www.epistasis.org.

[1]  Jason H. Moore,et al.  Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge , 2007 .

[2]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[3]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[4]  Jason H. Moore,et al.  Exploiting Expert Knowledge in Genetic Programming for Genome-Wide Genetic Analysis , 2006, PPSN.

[5]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[6]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[7]  Jason H. Moore,et al.  STUDENTJAMA. The challenges of whole-genome approaches to common diseases. , 2004, JAMA.

[8]  Casey S Greene,et al.  Ability of epistatic interactions of cytokine single-nucleotide polymorphisms to predict susceptibility to disease subsets in systemic sclerosis patients. , 2008, Arthritis and rheumatism.

[9]  Sokal Rr,et al.  Biometry: the principles and practice of statistics in biological research 2nd edition. , 1981 .

[10]  Marko Robnik-Sikonja,et al.  An adaptation of Relief for attribute estimation in regression , 1997, ICML.

[11]  Daniel E. Weeks,et al.  Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers , 2009, PLoS genetics.

[12]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[13]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[14]  Marylyn D. Ritchie,et al.  Linkage Disequilibrium in Genetic Association Studies Improves the Performance of Grammatical Evolution Neural Networks , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[15]  Jason H. Moore,et al.  Tuning ReliefF for Genome-Wide Genetic Analysis , 2007, EvoBIO.

[16]  U. Finckh,et al.  The future of genetic association studies in Alzheimer disease , 2003, Journal of Neural Transmission.

[17]  Peter Kraft,et al.  Genetic risk prediction--are we there yet? , 2009, The New England journal of medicine.

[18]  A. Singleton,et al.  Genomewide association studies and human disease. , 2009, The New England journal of medicine.

[19]  Jason H. Moore,et al.  An Expert Knowledge-Guided Mutation Operator for Genome-Wide Genetic Analysis Using Genetic Programming , 2007, PRIB.

[20]  Jason H. Moore,et al.  Evaporative cooling feature selection for genotypic data involving interactions , 2007, Bioinform..

[21]  Jason H. Moore,et al.  Nature-inspired algorithms for the genetic analysis of epistasis in common human diseases: Theoretical assessment of wrapper vs. filter approaches , 2009, 2009 IEEE Congress on Evolutionary Computation.

[22]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[23]  Jason H. Moore,et al.  Ant Colony Optimization for Genome-Wide Genetic Analysis , 2008, ANTS Conference.

[24]  B. McKinney,et al.  Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis , 2009, PLoS genetics.

[25]  Scott M. Williams,et al.  Shadows of complexity: what biological networks reveal about epistasis and pleiotropy , 2009, BioEssays : news and reviews in molecular, cellular and developmental biology.

[26]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[27]  Mark M Iles,et al.  What Can Genome-Wide Association Studies Tell Us about the Genetics of Common Disease , 2008, PLoS genetics.