xQTLImp: efficient and accurate xQTL summary statistics imputation

Motivation Quantitative trait locus (xQTL) analysis to multi-omic molecular features, such as gene transcription (eQTL), DNA methylation (mQTL) and histone modification (haQTL), is being widely used to decipher the effect of genomic variations on multi-level molecular activities. However, missing genotypes and limited effective sample size largely reduce the power of detecting significant xQTLs in single study or integrating multi-studies for meta-analysis. While existing hidden Markov models (HMMs) based imputation approaches require individual-level genotypes and molecular traits, there is still no available implementation suitable for imputation of xQTL summary statistics, which is becoming widely available and useful. Results We developed xQTLImp, a C++ software specially designed to efficiently impute xQTL summary statistics based on multivariate Gaussian approximation. Experiments on a single cell eQTL dataset demonstrated that considerable amount of novel significant eQTL associations can be rediscovered by xQTLImp. Availability Software is available at https://github.com/hitbc/xQTLimp. Contact ydwang@hit.edu.cn or jiajiepeng@nwpu.edu.cn Supplementary information Supplementary data are available online.

[1]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[2]  Jeroen F. J. Laros,et al.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories , 2013, Nature Biotechnology.

[3]  M. G. van der Wijst,et al.  Single-cell RNA sequencing identifies cell type-specific cis-eQTLs and co-expression QTLs , 2018, Nature Genetics.

[4]  Matthew Stephens,et al.  USING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA. , 2010, The annals of applied statistics.

[5]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[6]  Hugues Aschard,et al.  RAISS: Robust and Accurate imputation from Summary Statistics , 2018, bioRxiv.

[7]  Andrea Cipriani,et al.  Imputing missing standard deviations in meta-analyses can provide accurate results. , 2006, Journal of clinical epidemiology.

[8]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[9]  Ellis Patrick,et al.  An xQTL map integrates the genetic architecture of the human brain’s transcriptome and epigenome , 2017, Nature Neuroscience.

[10]  M. Boehnke,et al.  So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. , 2007, American journal of human genetics.

[11]  J. Knight,et al.  Pathogenic implications for autoimmune mechanisms derived by comparative eQTL analysis of CD4+ versus CD8+ T cells , 2017, PLoS genetics.

[12]  Donghyung Lee,et al.  DIST: direct imputation of summary statistics for unmeasured SNPs , 2013, Bioinform..

[13]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[14]  Yang I Li,et al.  Discovery and characterization of variance QTLs in human induced pluripotent stem cells , 2018, bioRxiv.

[15]  Gaurav Bhatia,et al.  Fast and accurate imputation of summary statistics enhances evidence of functional enrichment , 2013, Bioinform..

[16]  Alexander Lex,et al.  UpSetR: an R package for the visualization of intersecting sets and their properties , 2017, bioRxiv.

[17]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[18]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[19]  Eran Halperin,et al.  Leveraging genetic variability across populations for the identification of causal variants. , 2010, American journal of human genetics.

[20]  J. Greenbaum,et al.  Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression , 2018, Cell.

[21]  Miao-Xin Li,et al.  FAPI: Fast and accurate P-value Imputation for genome-wide association study , 2015, European Journal of Human Genetics.

[22]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[23]  Charles C. White,et al.  A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer’s disease , 2018, Nature Neuroscience.

[24]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[25]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[26]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.