The Informative Extremes: Using Both Nearest and Farthest Individuals Can Improve Relief Algorithms in the Domain of Human Genetics

A primary goal of human genetics is the discovery of genetic factors that influence individual susceptibility to common human diseases. This problem is difficult because common diseases are likely the result of joint failure of two or more interacting components instead of single component failures. Efficient algorithms that can detect interacting attributes are needed. The Relief family of machine learning algorithms, which use nearest neighbors to weight attributes, are a promising approach. Recently an improved Relief algorithm called Spatially Uniform ReliefF (SURF) has been developed that significantly increases the ability of these algorithms to detect interacting attributes. Here we introduce an algorithm called SURF* which uses distant instances along with the usual nearby ones to weight attributes. The weighting depends on whether the instances are are nearby or distant. We show this new algorithm significantly outperforms both ReliefF and SURF for genetic analysis in the presence of attribute interactions. We make SURF* freely available in the open source MDR software package. MDR is a cross-platform Java application which features a user friendly graphical interface.

[1]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[2]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[3]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[4]  K. Gunderson,et al.  A genome-wide scalable SNP genotyping assay using microarray technology , 2005, Nature Genetics.

[5]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[6]  Jason H. Moore,et al.  STUDENTJAMA. The challenges of whole-genome approaches to common diseases. , 2004, JAMA.

[7]  Jason H. Moore,et al.  Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions , 2009, BioData Mining.

[8]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[9]  Jason H. Moore,et al.  Evaporative cooling feature selection for genotypic data involving interactions , 2007, Bioinform..

[10]  F. Morón,et al.  A method for detecting epistasis in genome-wide studies using case-control multi-locus association analysis , 2008, BMC Genomics.

[11]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[12]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[13]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[14]  James G Taylor,et al.  Using genetic variation to study immunomodulation. , 2002, Current opinion in pharmacology.

[15]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[16]  D. Duggan,et al.  Recent developments in genomewide association scans: a workshop summary and review. , 2005, American journal of human genetics.

[17]  Daniel E. Weeks,et al.  Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers , 2009, PLoS genetics.

[18]  Annie E. Hill,et al.  Genetic architecture of complex traits: Large phenotypic effects and pervasive epistasis , 2008, Proceedings of the National Academy of Sciences.

[19]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[20]  B. McKinney,et al.  Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis , 2009, PLoS genetics.

[21]  Scott M. Williams,et al.  Shadows of complexity: what biological networks reveal about epistasis and pleiotropy , 2009, BioEssays : news and reviews in molecular, cellular and developmental biology.

[22]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[23]  Hemant K Tiwari,et al.  Problems with Genome-Wide Association Studies , 2007, Science.

[24]  Casey S Greene,et al.  Ability of epistatic interactions of cytokine single-nucleotide polymorphisms to predict susceptibility to disease subsets in systemic sclerosis patients. , 2008, Arthritis and rheumatism.

[25]  Alex Alves Freitas,et al.  Understanding the Crucial Role of Attribute Interaction in Data Mining , 2001, Artificial Intelligence Review.

[26]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[27]  Marko Robnik-Sikonja,et al.  An adaptation of Relief for attribute estimation in regression , 1997, ICML.

[28]  K. Gunderson,et al.  Whole genome genotyping technologies on the BeadArray™ platform , 2007 .

[29]  Thomas Mitchell-Olds,et al.  Epistasis and balanced polymorphism influencing complex trait variation , 2005, Nature.