Finding Unique Filter Sets in PLATO: A Precursor to Efficient Interaction Analysis in GWAS Data

The methods to detect gene-gene interactions between variants in genome-wide association study (GWAS) datasets have not been well developed thus far. PLATO, the Platform for the Analysis, Translation and Organization of large-scale data, is a filter-based method bringing together many analytical methods simultaneously in an effort to solve this problem. PLATO filters a large, genomic dataset down to a subset of genetic variants, which may be useful for interaction analysis. As a precursor to the use of PLATO for the detection of gene-gene interactions, the implementation of a variety of single locus filters was completed and evaluated as a proof of concept. To streamline PLATO for efficient epistasis analysis, we determined which of 24 analytical filters produced redundant results. Using a kappa score to identify agreement between filters, we grouped the analytical filters into 4 filter classes; thus all further analyses employed four filters. We then tested the MAX statistic put forth by Sladek et al. (1) in simulated data exploring a number of genetic models of modest effect size. To find the MAX statistic, the four filters were run on each SNP in each dataset and the smallest p-value among the four results was taken as the final result. Permutation testing was performed to empirically determine the p-value. The power of the MAX statistic to detect each of the simulated effects was determined in addition to the Type 1 error and false positive rates. The results of this simulation study demonstrates that PLATO using the four filters incorporating the MAX statistic has higher power on average to find multiple types of effects and a lower false positive rate than any of the individual filters alone. In the future we will extend PLATO with the MAX statistic to interaction analyses for large-scale genomic datasets.

[1]  Alison A Motsinger,et al.  Multifactor dimensionality reduction for detecting gene-gene and gene-environment interactions in pharmacogenomics studies. , 2005, Pharmacogenomics.

[2]  R. Felder,et al.  Combinations of variations in multiple genes are associated with hypertension. , 2000, Hypertension.

[3]  Forbes Ad,et al.  Classification-algorithm evaluation: five performance measures based on confusion matrices. , 1995 .

[4]  T. Wickens Multiway Contingency Tables Analysis for the Social Sciences , 1989 .

[5]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[6]  Marylyn D. Ritchie,et al.  Data Simulation Software for Whole-Genome Association and Other Studies in Human Genetics , 2005, Pacific Symposium on Biocomputing.

[7]  E R McCabe,et al.  Phenotypes of patients with "simple" Mendelian disorders are complex traits: thresholds, modifiers, and systems dynamics. , 2000, American journal of human genetics.

[8]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[9]  Marylyn D. Ritchie,et al.  Alternative contingency table measures improve the power and detection of multifactor dimensionality reduction , 2008, BMC Bioinformatics.

[10]  David Altshuler,et al.  Once and again-issues surrounding replication in genetic association studies. , 2002, The Journal of clinical endocrinology and metabolism.

[11]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[12]  Marylyn D. Ritchie,et al.  Generating Linkage Disequilibrium Patterns in Data Simulations Using genomeSIMLA , 2008, EvoBIO.

[13]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[14]  P. Armitage Tests for Linear Trends in Proportions and Frequencies , 1955 .

[15]  David M. Reif,et al.  A comparison of analytical methods for genetic association studies , 2008, Genetic epidemiology.

[16]  William Shannon,et al.  Detecting epistatic interactions contributing to quantitative traits , 2004, Genetic epidemiology.

[17]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[18]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[19]  Marylyn D. Ritchie,et al.  Pacific Symposium on Biocomputing 14:368-379 (2009) BIOFILTER: A KNOWLEDGE-INTEGRATION SYSTEM FOR THE MULTI-LOCUS ANALYSIS OF GENOME-WIDE ASSOCIATION STUDIES * , 2022 .

[20]  Minerva M. Carrasquillo,et al.  Genome-wide association study and mouse model identify interaction between RET and EDNRB pathways in Hirschsprung disease , 2002, Nature Genetics.

[21]  A. Dunker The pacific symposium on biocomputing , 1998 .

[22]  Z. Anusz [Statistics in epidemiology]. , 1974, Pielegniarka i polozna.

[23]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[24]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[25]  E. McCabe,et al.  Modifier genes convert "simple" Mendelian disorders to complex traits. , 2000, Molecular genetics and metabolism.

[26]  Eric J Duell,et al.  Detecting Pathway-Based Gene-Gene and Gene-Environment Interactions in Pancreatic Cancer , 2008, Cancer Epidemiology Biomarkers & Prevention.

[27]  Joseph L. Gastwirth,et al.  Trend Tests for Case-Control Studies of Genetic Markers: Power, Sample Size and Robustness , 2002, Human Heredity.