The Impact of Stability Considerations on Genetic Fine-Mapping

Fine-mapping methods, which aim to identify genetic variants responsible for complex traits following genetic association studies, typically assume that sufficient adjustments for confounding within the association study cohort have been made, e.g., through regressing out the top principal components (i.e., residualization). Despite its widespread use, however, residualization may not completely remove all sources of confounding. Here, we propose a complementary stability-guided approach that does not rely on residualization, which identifies consistently fine-mapped variants across different genetic backgrounds or environments. We demonstrate the utility of this approach by applying it to fine-map eQTLs in the GEUVADIS data. Using 378 different functional annotations of the human genome, including recent deep learning-based annotations (e.g., Enformer), we compare enrichments of these annotations among variants for which the stability and traditional residualization-based fine-mapping approaches agree against those for which they disagree, and find that the stability approach enhances the power of traditional fine-mapping methods in identifying variants with functional impact. Finally, in cases where the two approaches report distinct variants, our approach identifies variants comparably enriched for functional annotations. Our findings suggest that the stability principle, as a conceptually simple device, complements existing approaches to fine-mapping, reinforcing recent advocacy of evaluating cross-population and cross-environment portability of biological findings. To support visualization and interpretation of our results, we provide a Shiny app, available at: https://alan-aw.shinyapps.io/stability_v0/.

[1]  J. Gagneur,et al.  Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers , 2022, bioRxiv.

[2]  A. Abdellaoui,et al.  Gene–environment correlations across geographic regions affect genome-wide association studies , 2022, Nature Genetics.

[3]  D. Conti,et al.  Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies , 2022, bioRxiv.

[4]  H. Im,et al.  Polygenic transcriptome risk scores (PTRS) can improve portability of polygenic risk scores across ancestries , 2022, Genome biology.

[5]  James E. Allen,et al.  Ensembl 2022 , 2021, Nucleic Acids Res..

[6]  David R. Kelley,et al.  Effective gene expression prediction from sequence by integrating long-range interactions , 2021, Nature Methods.

[7]  J. Witte,et al.  Inclusion of variants discovered from diverse populations improves polygenic risk score transferability , 2020, bioRxiv.

[8]  E. Eskin,et al.  Identifying causal variants by fine mapping across multiple studies , 2020, bioRxiv.

[9]  Chun Jimmie Ye,et al.  On the cross-population generalizability of gene expression prediction models , 2019, bioRxiv.

[10]  Lior Pachter,et al.  Expression reflects population structure , 2018, bioRxiv.

[11]  Fabien C. Lamaze,et al.  Gene-by-environment interactions in urban populations modulate risk phenotypes , 2018, Nature Communications.

[12]  Nicholas B. Larson,et al.  FIRE: functional inference of genetic variants that regulate gene expression , 2017, Bioinform..

[13]  Po-Ru Loh,et al.  Multi-ethnic polygenic risk scores improve risk prediction in diverse populations , 2016, bioRxiv.

[14]  Patrick McGillivray,et al.  Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes , 2017, Nature Communications.

[15]  James B. Brown,et al.  Iterative random forests to discover predictive and stable high-order interactions , 2017, Proceedings of the National Academy of Sciences.

[16]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[17]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[18]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.

[19]  M. Daly,et al.  Genetic and Epigenetic Fine-Mapping of Causal Autoimmune Disease Variants , 2014, Nature.

[20]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[21]  Bin Yu,et al.  Estimation Stability With Cross-Validation (ESCV) , 2013, 1303.3128.

[22]  Eleazar Eskin,et al.  Interpreting Meta-Analyses of Genome-Wide Association Studies , 2012, PLoS genetics.

[23]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[24]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[25]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[26]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .