Evaluation of Genotype-Based Gene Expression Model Performance: A Cross-Framework and Cross-Dataset Study

Predicting gene expression from genotyped data is valuable for studying inaccessible tissues such as the brain. Herein we present eGenScore, a polygenic/poly-variation method, and compare it with PrediXcan, a method based on regularized linear regression using elastic nets. While both methods have the same purpose of predicting gene expression based on genotype, they carry important methodological differences. We compared the performance of expression quantitative trait loci (eQTL) models to predict gene expression in the frontal cortex, comparing across these frameworks (eGenScore vs. PrediXcan) and training datasets (BrainEAC, which is brain-specific, vs. GTEx, which has data across multiple tissues). In addition to internal five-fold cross-validation, we externally validated the gene expression models using the CommonMind Consortium database. Our results showed that (1) PrediXcan outperforms eGenScore regardless of the training database used; and (2) when using PrediXcan, the performance of the eQTL models in frontal cortex is higher when trained with GTEx than with BrainEAC.

[1]  S. Trifu,et al.  Genetics of schizophrenia (Review) , 2020, Experimental and therapeutic medicine.

[2]  Guillermo Marco-Puche,et al.  RNA-Seq Perspectives to Improve Clinical Diagnosis , 2019, Front. Genet..

[3]  Christopher D. Brown,et al.  The GTEx Consortium atlas of genetic regulatory effects across human tissues , 2019, Science.

[4]  Kelsey S. Montgomery,et al.  CommonMind Consortium provides transcriptomic and epigenomic data for Schizophrenia and Bipolar Disorder , 2019, Scientific Data.

[5]  M. Owen,et al.  Gene expression imputation across multiple brain regions provides insights into schizophrenia risk , 2019, Nature Genetics.

[6]  O. Andreassen,et al.  A global overview of pleiotropy and genetic architecture in complex traits , 2019, Nature Genetics.

[7]  R. Murray,et al.  The Maudsley environmental risk score for psychosis , 2018, bioRxiv.

[8]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[9]  Mary Goldman,et al.  Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics , 2016, Nature Communications.

[10]  Francesco Muntoni,et al.  Improving genetic diagnosis in Mendelian disease with transcriptome sequencing , 2016, Science Translational Medicine.

[11]  Lin S. Chen,et al.  Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx. , 2015, American journal of human genetics.

[12]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[13]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[14]  N. Wray,et al.  Research review: Polygenic methods and their application to psychiatric traits. , 2014, Journal of child psychology and psychiatry, and allied disciplines.

[15]  A. Singleton,et al.  Genetic variability in the regulation of gene expression in ten regions of the human brain , 2014, Nature Neuroscience.

[16]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[17]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[18]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[19]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[20]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[21]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[22]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[23]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.