A simple yet accurate correction for winner's curse can predict signals discovered in much larger genome scans

Motivation: For genetic studies, statistically significant variants explain far less trait variance than ‘sub-threshold’ association signals. To dimension follow-up studies, researchers need to accurately estimate ‘true’ effect sizes at each SNP, e.g. the true mean of odds ratios (ORs)/regression coefficients (RRs) or Z-score noncentralities. Naïve estimates of effect sizes incur winner’s curse biases, which are reduced only by laborious winner’s curse adjustments (WCAs). Given that Z-scores estimates can be theoretically translated on other scales, we propose a simple method to compute WCA for Z-scores, i.e. their true means/noncentralities. Results:WCA of Z-scores shrinks these towards zero while, on P-value scale, multiple testing adjustment (MTA) shrinks P-values toward one, which corresponds to the zero Z-score value. Thus, WCA on Z-scores scale is a proxy for MTA on P-value scale. Therefore, to estimate Z-score noncentralities for all SNPs in genome scans, we propose FDR Inverse Quantile Transformation (FIQT). It (i) performs the simpler MTA of P-values using FDR and (ii) obtains noncentralities by back-transforming MTA P-values on Z-score scale. When compared to competitors, realistic simulations suggest that FIQT is more (i) accurate and (ii) computationally efficient by orders of magnitude. Practical application of FIQT to Psychiatric Genetic Consortium schizophrenia cohort predicts a non-trivial fraction of sub-threshold signals which become significant in much larger supersamples. Conclusions: FIQT is a simple, yet accurate, WCA method for Z-scores (and ORs/RRs, via simple transformations). Availability and Implementation: A 10 lines R function implementation is available at https://github.com/bacanusa/FIQT. Contact: sabacanu@vcu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Hongyu Zhao,et al.  Empirical Bayes Correction for the Winner's Curse in Genetic Association Studies , 2013, Genetic epidemiology.

[2]  R. Prentice,et al.  Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. , 2008, Biostatistics.

[3]  Michael Boehnke,et al.  Quantifying and correcting for the winner's curse in quantitative‐trait association studies , 2011, Genetic epidemiology.

[4]  C. Spencer,et al.  A contribution of novel CNVs to schizophrenia from a genome-wide study of 41,321 subjects: CNV Analysis Group and the Schizophrenia Working Group of the Psychiatric Genomics Consortium , 2016, bioRxiv.

[5]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[6]  Disorder Working Group Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4 , 2012, Nature Genetics.

[7]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[8]  J. Pritchard,et al.  Overcoming the winner's curse: estimating penetrance parameters from case-control data. , 2007, American journal of human genetics.

[9]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[10]  Donghyung Lee,et al.  DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts , 2015, Bioinform..

[11]  B. Efron Empirical Bayes Estimates for Large-Scale Prediction Problems , 2009, Journal of the American Statistical Association.

[12]  A. Jenkinson The frequency distribution of the annual maximum (or minimum) values of meteorological elements , 1955 .

[13]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[14]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[15]  Shelley B Bull,et al.  A flexible genome‐wide bootstrap method that accounts for rankingand threshold‐selection bias in GWAS interpretation and replication study design , 2011, Statistics in medicine.

[16]  Jack Bowden,et al.  Unbiased estimation of odds ratios: combining genomewide association scans with replication studies , 2009, Genetic epidemiology.

[17]  F. Dudbridge,et al.  Estimation of significance thresholds for genomewide association scans , 2008, Genetic epidemiology.

[18]  Radu V. Craiu,et al.  Bayesian methods to overcome the winner’s curse in genetic studies , 2009, 0907.2770.

[19]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[20]  K. Kendler,et al.  Extracting Actionable Information From Genome Scans , 2013, Genetic epidemiology.

[21]  Simon C. Potter,et al.  Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia , 2013, Nature Genetics.

[22]  Fei Zou,et al.  Estimating odds ratios in genome scans: an approximate conditional likelihood approach. , 2008, American journal of human genetics.

[23]  R. Kass,et al.  Shrinkage Estimators for Covariance Matrices , 2001, Biometrics.

[24]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[25]  Chi Pui Pang,et al.  HTRA1 promoter polymorphism in wet age-related macular degeneration. , 2007, Science.

[26]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[27]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[28]  Shelley B. Bull,et al.  BR-squared: a practical solution to the winner’s curse in genome-wide scans , 2011, Human Genetics.

[29]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[30]  Anders D. Børglum,et al.  Genome-wide association study identifies five new schizophrenia loci , 2011, Nature Genetics.