An automated quality control pipeline for eQTL analysis with RNA-seq data

Expression quantitative trait loci (eQTL) analysis is of critical importance to understand the mechanism underlying trait associated variants. Evaluating and controlling the data quality of transcripts and genotypes, which are basis of eQTL analysis, remains challenging for researchers with limited computational backgrounds. There is a strong need for a user-friendly and comprehensive tool to pre-process those data sets automatically. Here we propose such a solution, eQTLQC, an automated quality control pipeline for preprocessing both RNA-seq and genotype data. The eQTLQC pipeline provides multiple informative quality control measurements and data normalization approaches. And it provides a easy-to-use configuration file for users to flexibly set up the parameters and control the pipeline. We demonstrate its utility by performing RNA-seq and genotype preprocessing on real data sets. eQTLQC is open source and freely available at https://github.com/ruanjunpeng/eQTLQC.

[1]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[2]  Jeroen F. J. Laros,et al.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories , 2013, Nature Biotechnology.

[3]  M. G. van der Wijst,et al.  Single-cell RNA sequencing identifies cell type-specific cis-eQTLs and co-expression QTLs , 2018, Nature Genetics.

[4]  F. Vannberg,et al.  GENETICS OF GENE EXPRESSION IN PRIMARY IMMUNE CELLS IDENTIFIES CELL-SPECIFIC MASTER REGULATORS AND ROLES OF HLA ALLELES , 2012, Nature Genetics.

[5]  M. Nalls,et al.  A meta-analysis of genome-wide association studies identifies 17 new Parkinson's disease risk loci , 2017, Nature Genetics.

[6]  Zoltán Kutalik,et al.  Quality control and conduct of genome-wide association meta-analyses , 2014, Nature Protocols.

[7]  Charles C. White,et al.  A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer’s disease , 2018, Nature Neuroscience.

[8]  J. Schneider,et al.  Overview and findings from the rush Memory and Aging Project. , 2012, Current Alzheimer research.

[9]  A. Morris,et al.  Data quality control in genetic case-control association studies , 2010, Nature Protocols.

[10]  Chuong B. Do,et al.  Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease , 2014, Nature Genetics.

[11]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[12]  J. Greenbaum,et al.  Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression , 2018, Cell.

[13]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[14]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[15]  Yang I Li,et al.  Discovery and characterization of variance QTLs in human induced pluripotent stem cells , 2018, bioRxiv.

[16]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[17]  J. Schneider,et al.  Overview and findings from the religious orders study. , 2012, Current Alzheimer research.

[18]  Emmanouil T. Dermitzakis,et al.  Fast and efficient QTL mapper for thousands of molecular phenotypes , 2015, bioRxiv.

[19]  Olivier Delaneau,et al.  A complete tool set for molecular QTL discovery and analysis , 2016, Nature Communications.

[20]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[21]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[22]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[23]  Tao Wang,et al.  Enhancers active in dopamine neurons are a primary link between genetic variation and neuropsychiatric disease , 2018, Nature Neuroscience.

[24]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.