eQTL Mapping via Effective SNP Ranking and Screening

Genome-wide eQTL mapping explores the relationship between gene expression values and DNA variants to understand genetic causes of human disease. Due to the large number of genes and DNA variants that need to be assessed simultaneously, current methods for eQTL mapping often suffer from low detection power, especially for identifying trans-eQTLs. In this paper, we propose a new method that utilizes advanced techniques in large-scale signal detection to pursue the structure of eQTL data and improve the power for eQTL mapping. The new method greatly reduces the burden of joint modeling by developing a new ranking and screening strategy based on the higher criticism statistic. Numerical results in simulation studies demonstrate the superior performance of our method in detecting true eQTLs with reduced computational expense. The proposed method is also evaluated in HapMap eQTL data analysis and the results are compared to a database of known eQTLs.

[1]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[2]  G. Reinsel,et al.  Multivariate Reduced-Rank Regression: Theory and Applications , 1998 .

[3]  Fred A. Wright,et al.  seeQTL: a searchable database for human eQTLs , 2011, Bioinform..

[4]  L. Kruglyak,et al.  The role of regulatory variation in complex traits and disease , 2015, Nature Reviews Genetics.

[5]  Neil D. Lawrence,et al.  Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies , 2012, PLoS Comput. Biol..

[6]  E. Dermitzakis,et al.  Expression quantitative trait loci: present and future , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[7]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[8]  C. Kendziorski,et al.  Statistical Methods for Expression Quantitative Trait Loci (eQTL) Mapping , 2006, Biometrics.

[9]  X. Cui,et al.  Single nucleotide polymorphisms affect both cis- and trans-eQTLs. , 2009, Genomics.

[10]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[11]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[12]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[13]  Jiashun Jin,et al.  Optimal detection of heterogeneous and heteroscedastic mixtures , 2011 .

[14]  Jason G. Mezey,et al.  HEFT: eQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors , 2014, Bioinform..

[15]  Lin Wang,et al.  Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping , 2013, Bioinform..

[16]  M. Peters,et al.  Systematic identification of trans eQTLs as putative drivers of known disease associations , 2013, Nature Genetics.

[17]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .

[18]  Jason G. Mezey,et al.  An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci , 2017, PLoS Comput. Biol..

[19]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[20]  Carlos Cristiano Hasenclever Borges,et al.  SNPs selection using support vector regression and genetic algorithms in GWAS , 2014, BMC Genomics.

[21]  Michael F. Miles,et al.  Identifying Gene Networks Underlying the Neurobiology of Ethanol and Alcoholism , 2012, Alcohol research : current reviews.