Using propensity score adjustment method in genetic association studies

BACKGROUND The statistical tests for single locus disease association are mostly under-powered. If a disease associated causal single nucleotide polymorphism (SNP) operates essentially through a complex mechanism that involves multiple SNPs or possible environmental factors, its effect might be missed if the causal SNP is studied in isolation without accounting for these unknown genetic influences. In this study, we attempt to address the issue of reduced power that is inherent in single point association studies by accounting for genetic influences that negatively impact the detection of causal variant in single point association analysis. In our method we use propensity score (PS) to adjust for the effect of SNPs that influence the marginal association of a candidate marker. These SNPs might be in linkage disequilibrium (LD) and/or epistatic with the target-SNP and have a joint interactive influence on the disease under study. We therefore propose a propensity score adjustment method (PSAM) as a tool for dimension reduction to improve the power for single locus studies through an estimated PS to adjust for influence from these SNPs while regressing disease status on the target-genetic locus. The degree of freedom of such a test is therefore always restricted to 1. RESULTS We assess PSAM under the null hypothesis of no disease association to affirm that it correctly controls for the type-I-error rate (<0.05). PSAM displays reasonable power (>70%) and shows an average of 15% improvement in power as compared with commonly-used logistic regression method and PLINK under most simulated scenarios. Using the open-access multifactor dimensionality reduction dataset, PSAM displays improved significance for all disease loci. Through a whole genome study, PSAM was able to identify 21 SNPs from the GAW16 NARAC dataset by reducing their original trend-test p-values from within 0.001 and 0.05 to p-values less than 0.0009, and among which 6 SNPs were further found to be associated with immunity and inflammation. CONCLUSIONS PSAM improves the significance of single-locus association of causal SNPs which have had marginal single point association by adjusting for influence from other SNPs in a dataset. This would explain part of the missing heritability without increasing the complexity of the model due to huge multiple testing scenarios. The newly reported SNPs from GAW16 data would provide evidences for further research to elucidate the etiology of rheumatoid arthritis. PSAM is proposed as an exploratory tool that would be complementary to other existing methods. A downloadable user friendly program, PSAM, written in SAS, is available for public use.

[1]  Min-Seok Kwon,et al.  A Modified Entropy-Based Approach for Identifying Gene-Gene Interactions in Case-Control Study , 2013, PloS one.

[2]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[3]  D. de Ridder,et al.  GATA3 Expression Is Decreased in Psoriasis and during Epidermal Regeneration; Induction by Narrow-Band UVB and IL-4 , 2011, PloS one.

[4]  R. D'Agostino Propensity Scores in Cardiovascular Research , 2007, Circulation.

[5]  C. Weyand,et al.  Deficiency of the DNA repair enzyme ATM in rheumatoid arthritis , 2009, The Journal of experimental medicine.

[6]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[7]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.

[8]  Kumar Vn,et al.  Transcutaneous nerve stimulation in rheumatoid arthritis. , 1982 .

[9]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[10]  A. Ogata,et al.  DNA Damage in Rheumatoid Arthritis: An Age-Dependent Increase in the Lipid Peroxidation-Derived DNA Adduct, Heptanone-Etheno-2′-Deoxycytidine , 2013, Autoimmune diseases.

[11]  Yi Wang,et al.  Exploration of gene–gene interaction effects using entropy-based methods , 2008, European Journal of Human Genetics.

[12]  Ie-Bin Lian,et al.  Summarizing techniques that combine three non-parametric scores to detect disease-associated 2-way SNP-SNP interactions. , 2014, Gene.

[13]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[14]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[15]  Andrew P Morris,et al.  Basic statistical analysis in genetic case-control studies , 2011, Nature Protocols.

[16]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[17]  M. J. van der Laan,et al.  Practice of Epidemiology Improving Propensity Score Estimators ’ Robustness to Model Misspecification Using Super Learner , 2015 .

[18]  J. Avorn,et al.  High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data , 2009, Epidemiology.

[19]  D. Clayton,et al.  A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. , 2002, American journal of human genetics.

[20]  S. Vansteelandt,et al.  On the adjustment for covariates in genetic association analysis: a novel, simple principle to infer direct causal effects , 2009, Genetic epidemiology.

[21]  Heping Zhang,et al.  Propensity score‐based nonparametric test revealing genetic variants underlying bipolar disorder , 2011, Genetic epidemiology.

[22]  S A Seuchter,et al.  Two-locus disease models with two marker loci: the power of affected-sib-pair tests. , 1994, American journal of human genetics.

[23]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[24]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[25]  C. Fann,et al.  On the use of multifactor dimensionality reduction (MDR) and classification and regression tree (CART) to identify haplotype-haplotype interactions in genetic studies. , 2011, Genomics.

[26]  Ie-Bin Lian,et al.  Reducing over-dispersion by generalized degree of freedom and propensity score , 2003, Comput. Stat. Data Anal..

[27]  A. Baghestani,et al.  How to control confounding effects by statistical analysis , 2012, Gastroenterology and hepatology from bed to bench.

[28]  M. Caligiuri,et al.  Innate or Adaptive Immunity? The Example of Natural Killer Cells , 2011, Science.

[29]  D. Greenberg,et al.  Using Linkage Analysis to Detect Gene-Gene Interaction by Stratifying Family Data on Known Disease, or Disease-Associated, Alleles , 2014, PloS one.

[30]  S. L. Bridges,et al.  Journal of Neuroinflammation BioMed Central Hypothesis , 2007 .

[31]  J. Kere,et al.  Analysis of Neuropeptide S Receptor Gene (NPSR1) Polymorphism in Rheumatoid Arthritis , 2010, PloS one.

[32]  Matheus C. Bürger,et al.  Altered Expression of Immune-Related Genes in Children with Down Syndrome , 2014, PloS one.

[33]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[34]  M W Gillman,et al.  Interpretation of observational studies , 2004, Heart.

[35]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[36]  Masao Ueki On the choice of degrees of freedom for testing gene-gene interactions. , 2014, Statistics in medicine.

[37]  Andreas Krämer,et al.  Causal analysis approaches in Ingenuity Pathway Analysis , 2013, Bioinform..