Genome-wide association analysis with matched samples discloses additional novel risk loci

Genome-wide association studies have identified many candidate causal variants associated with common complex diseases and traits, but most of them have been drawn from nonrandomized case/control designs. In nonrandomized experiments, the results drawn from two different groups can be misleading because the units exposed to one group generally differ systematically from the units exposed to the other group. Propensity score is widely used to group case and control units for a more direct and significant comparison even with nonrandomized experiments. This propensity score matching can help with prioritizing additional uncovered variants on disease risk via sub-group analysis in genome-wide association studies. The aim of this work is to propose a post-hoc association test based on the subsets of samples. For that purpose, this paper presents a new paradigm for a post-hoc genome-wide association test when the sample size of controls are larger than that of cases: selecting control samples by equating the distribution of covariates in the case and control groups and re-performing association analysis upon these matched samples. We demonstrated the feasibility of this approach by applying it to 2752 type II diabetes patients in 8842 Korean population. Genome-wide association approach with matched samples is able to disclose 9 additional novel variants and 7 out of 9 have not identified from the association test of whole control samples. The process described here can successfully be combined with other types of case/control studies with large covariate information. This indicates that there a possibility of obtaining additional candidate causal variants responsible for common diseases through genome-wide association analysis with matched samples.

[1]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[2]  Donald B. Rubin,et al.  Affinely invariant matching methods with discriminant mixtures of proportional ellipsoidally symmetric distributions , 2006, math/0611263.

[3]  Taesung Park,et al.  A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits , 2009, Nature Genetics.

[4]  Hong-Wen Deng,et al.  ALDH2 is associated to alcohol dependence and is the major genetic determinant of “daily maximum drinks” in a GWAS study of an isolated rural chinese sample , 2014, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[5]  Leonid Churilov,et al.  Why caution is recommended with post-hoc individual patient matching for estimation of treatment effect in parallel-group randomized controlled trials: the case of acute stroke trials. , 2013, Statistics in medicine.

[6]  Ayellet V. Segrè,et al.  Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis , 2010, Nature Genetics.

[7]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[8]  T. Ogihara,et al.  Confirmation of ALDH2 as a Major locus of drinking behavior and of its variants regulating multiple metabolic phenotypes in a Japanese population. , 2011, Circulation Journal.

[9]  P. Rosenbaum The Consequences of Adjustment for a Concomitant Variable that Has Been Affected by the Treatment , 1984 .

[10]  D. Rubin Matched Sampling for Causal Effects: Matching to Remove Bias in Observational Studies , 1973 .

[11]  A. Singleton,et al.  Genomewide association studies and human disease. , 2009, The New England journal of medicine.

[12]  S. Greenland Quantifying Biases in Causal Models: Classical Confounding vs Collider-Stratification Bias , 2003, Epidemiology.

[13]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[14]  Joan E Bailey-Wilson,et al.  Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies , 2008, BMC Genomics.

[15]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.