An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci

Genome-wide expression Quantitative Trait Loci (eQTL) studies in humans have provided numerous insights into the genetics of both gene expression and complex diseases. While the majority of eQTL identified in genome-wide analyses impact a single gene, eQTL that impact many genes are particularly valuable for network modeling and disease analysis. To enable the identification of such broad impact eQTL, we introduce CONFETI: Confounding Factor Estimation Through Independent component analysis. CONFETI is designed to address two conflicting issues when searching for broad impact eQTL: the need to account for non-genetic confounding factors that can lower the power of the analysis or produce broad impact eQTL false positives, and the tendency of methods that account for confounding factors to model broad impact eQTL as non-genetic variation. The key advance of the CONFETI framework is the use of Independent Component Analysis (ICA) to identify variation likely caused by broad impact eQTL when constructing the sample covariance matrix used for the random effect in a mixed model. We show that CONFETI has better performance than other mixed model confounding factor methods when considering broad impact eQTL recovery from synthetic data. We also used the CONFETI framework and these same confounding factor methods to identify eQTL that replicate between matched twin pair datasets in the Multiple Tissue Human Expression Resource (MuTHER), the Depression Genes Networks study (DGN), the Netherlands Study of Depression and Anxiety (NESDA), and multiple tissue types in the Genotype-Tissue Expression (GTEx) consortium. These analyses identified both cis-eQTL and trans-eQTL impacting individual genes, and CONFETI had better or comparable performance to other mixed model confounding factor analysis methods when identifying such eQTL. In these analyses, we were able to identify and replicate a few broad impact eQTL although the overall number was small even when applying CONFETI. In light of these results, we discuss the broad impact eQTL that have been previously reported from the analysis of human data and suggest that considerable caution should be exercised when making biological inferences based on these reported eQTL.

[1]  Enrico Petretto,et al.  Heritability and Tissue Specificity of Expression Quantitative Trait Loci , 2006, PLoS genetics.

[2]  Andrew E. Teschendorff,et al.  Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies , 2011, Bioinform..

[3]  Jeroen F. J. Laros,et al.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories , 2013, Nature Biotechnology.

[4]  Karen Cheng,et al.  Disease-Related Growth Factor and Embryonic Signaling Pathways Modulate an Enhancer of TCF21 Expression at the 6q23.2 Coronary Heart Disease Locus , 2013, PLoS genetics.

[5]  Martin Vingron,et al.  A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk , 2010, Nature.

[6]  Rachael Hageman Blair,et al.  What Can Causal Networks Tell Us about Metabolic Pathways? , 2012, PLoS Comput. Biol..

[7]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[8]  Don D. Sin,et al.  Refining Susceptibility Loci of Chronic Obstructive Pulmonary Disease with Lung eqtls , 2013, PloS one.

[9]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[10]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[11]  Jingyuan Fu,et al.  Human Disease-Associated Genetic Variation Impacts Large Intergenic Non-Coding RNA Expression , 2013, PLoS genetics.

[12]  Daphne Koller,et al.  Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge , 2013, PloS one.

[13]  Serge Batalov,et al.  Gene Set Enrichment in eQTL Data Identifies Novel Annotations and Pathway Regulators , 2008, PLoS genetics.

[14]  Xia Yang,et al.  Liver and Adipose Expression Associated SNPs Are Enriched for Association to Type 2 Diabetes , 2010, PLoS genetics.

[15]  Fuad G. Gwadry,et al.  Comparing cDNA and oligonucleotide array data: concordance of gene expression across platforms for the NCI-60 cancer cells , 2003, Genome Biology.

[16]  Eric E. Schadt,et al.  lrgpr: interactive linear mixed model analysis of genome-wide association studies with composite hypothesis testing and regression diagnostics in R , 2014, Bioinform..

[17]  K. Shianna,et al.  Tissue-Specific Genetic Control of Splicing: Implications for the Study of Complex Traits , 2008, PLoS biology.

[18]  Benjamin A. Logsdon,et al.  Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations , 2010, PLoS Comput. Biol..

[19]  P. Gestraud,et al.  Independent component analysis uncovers the landscape of the bladder tumor transcriptome and reveals insights into luminal and basal subtypes. , 2014, Cell reports.

[20]  E. Oja,et al.  Independent Component Analysis , 2013 .

[21]  Fabian J Theis,et al.  Bayesian independent component analysis recovers pathway signatures from blood metabolomics data. , 2012, Journal of proteome research.

[22]  Kristin Reiche,et al.  Dissecting the genetics of the human transcriptome identifies novel trait-related trans-eQTLs and corroborates the regulatory relevance of non-protein coding loci† , 2015, Human molecular genetics.

[23]  Alkes L. Price,et al.  Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals , 2011, PLoS genetics.

[24]  L. Almasy,et al.  Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes , 2007, Nature Genetics.

[25]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[26]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[27]  Ying Hu,et al.  A Systems Genetics Approach Identifies CXCL14, ITGAX, and LPCAT2 as Novel Aggressive Prostate Cancer Susceptibility Genes , 2014, PLoS genetics.

[28]  F. Vannberg,et al.  GENETICS OF GENE EXPRESSION IN PRIMARY IMMUNE CELLS IDENTIFIES CELL-SPECIFIC MASTER REGULATORS AND ROLES OF HLA ALLELES , 2012, Nature Genetics.

[29]  Nicholas G Martin,et al.  Genetic and Nongenetic Variation Revealed for the Principal Components of Human Gene Expression , 2013, Genetics.

[30]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[31]  Cisca Wijmenga,et al.  Analysis of SNPs with an effect on gene expression identifies UBE2L3 and BCL3 as potential new risk genes for Crohn's disease. , 2010, Human molecular genetics.

[32]  Scott A. Rifkin,et al.  Revealing the architecture of gene regulation: the promise of eQTL studies. , 2008, Trends in genetics : TIG.

[33]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[34]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[35]  J. Sinsheimer,et al.  Expression Quantitative Trait Loci: Replication, Tissue- and Sex-Specificity in Mice , 2010, Genetics.

[36]  Gonçalo R. Abecasis,et al.  Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma , 2007, Nature.

[37]  David R Goodlett,et al.  Genetic basis of proteome variation in yeast , 2007, Nature Genetics.

[38]  Lan Lin,et al.  Systems Biology With High-Throughput Sequencing Reveals Genetic Mechanisms Underlying the Metabolic Syndrome in the Lyon Hypertensive Rat , 2015, Circulation. Cardiovascular genetics.

[39]  Lin Wang,et al.  Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping , 2013, Bioinform..

[40]  Gabriel E. Hoffman,et al.  Correcting for Population Structure and Kinship Using the Linear Mixed Model: Theory and Extensions , 2013, PloS one.

[41]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Xia Yang,et al.  Integrating pathway analysis and genetics of gene expression for genome-wide association studies. , 2010, American journal of human genetics.

[43]  Eleazar Eskin,et al.  Distant regulatory effects of genetic variation in multiple human tissues , 2016, bioRxiv.

[44]  Wei Zhang,et al.  SCAN: SNP and copy number annotation , 2010, Bioinform..

[45]  Mark I. McCarthy,et al.  Erratum: Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes (Nature Genetics (2011) 43 (561-564)) , 2011 .

[46]  Manolis Kellis,et al.  Common Genetic Variants Modulate Pathogen-Sensing Responses in Human Dendritic Cells , 2014, Science.

[47]  Xia Yang,et al.  Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. , 2013, American journal of human genetics.

[48]  Eleazar Eskin,et al.  Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies , 2014, Genome Biology.

[49]  Neil D. Lawrence,et al.  Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies , 2012, PLoS Comput. Biol..

[50]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[51]  Eleazar Eskin,et al.  Discrete profile comparison using information bottleneck , 2006, BMC Bioinformatics.

[52]  Christoph Lippert,et al.  LIMIX: genetic analysis of multiple traits , 2014, bioRxiv.

[53]  Ayellet V. Segrè,et al.  Integrative Genomics Reveals Novel Molecular Pathways and Gene Networks for Coronary Artery Disease , 2014, PLoS genetics.

[54]  Simon C. Potter,et al.  Mapping cis- and trans-regulatory effects across multiple tissues in twins , 2012, Nature Genetics.

[55]  L. Kruglyak,et al.  The role of regulatory variation in complex traits and disease , 2015, Nature Reviews Genetics.

[56]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[57]  Anthony W. Segal,et al.  Characterization of Expression Quantitative Trait Loci in the Human Colon , 2015, Inflammatory bowel diseases.

[58]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[59]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[60]  Margaret A. Pericak-Vance,et al.  Brain Expression Genome-Wide Association Study (eGWAS) Identifies Human Disease-Associated Variants , 2012, PLoS genetics.

[61]  P. Deloukas,et al.  Integrating Genome-Wide Genetic Variations and Monocyte Expression Data Reveals Trans-Regulated Gene Modules in Humans , 2011, PLoS genetics.

[62]  Olle Melander,et al.  From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus , 2010, Nature.

[63]  Susan M. Bridges,et al.  Proceedings of the 2008 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference , 2008, BMC Bioinformatics.

[64]  M. Stephens,et al.  High-Resolution Mapping of Expression-QTLs Yields Insight into Human Gene Regulation , 2008, PLoS genetics.

[65]  Sara Marsal,et al.  Novel Insights into the Regulatory Architecture of CD4+ T Cells in Rheumatoid Arthritis , 2014, PloS one.

[66]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[67]  Mark I. McCarthy,et al.  Identification of an imprinted master trans-regulator at the KLF14 locus related to multiple metabolic phenotypes , 2011, Nature Genetics.

[68]  David Lindgren,et al.  Independent component analysis reveals new and biologically significant structures in micro array data , 2006, BMC Bioinformatics.

[69]  E. Nestler,et al.  Chronic cocaine-regulated epigenomic changes in mouse nucleus accumbens , 2014, Genome Biology.

[70]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[71]  Christian Gieger,et al.  Impact of common regulatory single-nucleotide variants on gene expression profiles in whole blood , 2012, European Journal of Human Genetics.

[72]  A. Lusis,et al.  Systems genetics approaches to understand complex traits , 2013, Nature Reviews Genetics.

[73]  Sayan Mukherjee,et al.  Bayesian group factor analysis with structured sparsity , 2016, J. Mach. Learn. Res..

[74]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[75]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[76]  Lykke Pedersen,et al.  Independent component and pathway-based analysis of miRNA-regulated gene expression in a model of type 1 diabetes , 2011, BMC Genomics.

[77]  Daphne Koller,et al.  Polarization of the Effects of Autoimmune and Neurodegenerative Risk Alleles in Leukocytes , 2014, Science.

[78]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[79]  Yusuke Nakamura,et al.  IRX4 at 5p15 suppresses prostate cancer growth through the interaction with vitamin D receptor, conferring prostate cancer susceptibility. , 2012, Human molecular genetics.

[80]  Chun Jimmie Ye,et al.  Accurate Discovery of Expression Quantitative Trait Loci Under Confounding From Spurious and Genuine Regulatory Hotspots , 2008, Genetics.

[81]  R. Andrews,et al.  Innate Immune Activity Conditions the Effect of Regulatory Variants upon Monocyte Gene Expression , 2014, Science.

[82]  Jonathan K. Pritchard,et al.  The Genetic and Mechanistic Basis for Variation in Gene Regulation , 2015, PLoS genetics.

[83]  M. Peters,et al.  Systematic identification of trans eQTLs as putative drivers of known disease associations , 2013, Nature Genetics.

[84]  A. Dunning,et al.  Beyond GWASs: illuminating the dark road from association to function. , 2013, American journal of human genetics.

[85]  Russ B. Altman,et al.  Independent component analysis: Mining microarray data for fundamental human gene expression modules , 2010, J. Biomed. Informatics.

[86]  Jingyuan Fu,et al.  Trans-eQTLs Reveal That Independent Genetic Variants Associated with a Complex Phenotype Converge on Intermediate Genes, with a Major Role for the HLA , 2011, PLoS genetics.

[87]  S. Horvath,et al.  Variations in DNA elucidate molecular networks that cause disease , 2008, Nature.

[88]  P. Sullivan,et al.  Heritability and Genomics of Gene Expression in Peripheral Blood , 2014, Nature Genetics.

[89]  Leopold Parts,et al.  A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies , 2010, PLoS Comput. Biol..

[90]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[91]  Steven P. Gygi,et al.  Defining the consequences of genetic variation on a proteome-wide scale , 2016, Nature.

[92]  John D. Storey,et al.  Mapping gene expression quantitative trait loci by singular value decomposition and independent component analysis , 2008, BMC Bioinformatics.

[93]  Jason G. Mezey,et al.  HEFT: eQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors , 2014, Bioinform..

[94]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[95]  Cheng Soon Ong,et al.  Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[96]  M C O'Donovan,et al.  Schizophrenia susceptibility alleles are enriched for alleles that affect gene expression in adult human brain , 2012, Molecular Psychiatry.

[97]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[98]  S. Hunt,et al.  Genome-Wide Associations of Gene Expression Variation in Humans , 2005, PLoS genetics.

[99]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[100]  Eric E Schadt,et al.  Cis-acting expression quantitative trait loci in mice. , 2005, Genome research.

[101]  Slave Petrovski,et al.  Systems genetics identifies Sestrin 3 as a regulator of a proconvulsant gene network in human epileptic hippocampus , 2015, Nature Communications.

[102]  E. Dermitzakis From gene expression to disease risk , 2008, Nature Genetics.

[103]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[104]  Alexander A. Morgan,et al.  Coanalysis of GWAS with eQTLs reveals disease-tissue associations , 2012, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[105]  S. Horvath,et al.  Evidence for anti-Burkitt tumour globulins in Burkitt tumour patients and healthy individuals. , 1967, British Journal of Cancer.

[106]  E. Dermitzakis,et al.  Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations , 2010, PLoS genetics.

[107]  P. Deloukas,et al.  Multiple common variants for celiac disease influencing immune gene expression , 2010, Nature Genetics.

[108]  M. McCarthy,et al.  Tensor decomposition for multi-tissue gene expression experiments , 2016, Nature Genetics.

[109]  Oliver Stegle,et al.  Accounting for Non-genetic Factors Improves the Power of eQTL Studies , 2008, RECOMB.

[110]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[111]  E. Dermitzakis,et al.  From expression QTLs to personalized transcriptomics , 2011, Nature Reviews Genetics.

[112]  L. Kruglyak,et al.  Gene–Environment Interaction in Yeast Gene Expression , 2008, PLoS biology.