Biological relevance of computationally predicted pathogenicity of noncoding variants

Computational prediction of the phenotypic propensities of noncoding single nucleotide variants typically combines annotation of genomic, functional and evolutionary attributes into a single score. Here, we evaluate if the claimed excellent accuracies of these predictions translate into high rates of success in addressing questions important in biological research, such as fine mapping causal variants, distinguishing pathogenic allele(s) at a given position, and prioritizing variants for genetic risk assessment. A significant disconnect is found to exist between the statistical modelling and biological performance of predictive approaches. We discuss fundamental reasons underlying these deficiencies and suggest that future improvements of computational predictions need to address confounding of allelic, positional and regional effects as well as imbalance of the proportion of true positive variants in candidate lists.Researchers can make use of a variety of computational tools to prioritize genetic variants and predict their pathogenicity. Here, the authors evaluate the performance of six of these tools in three typical biological tasks and find generally low concordance of predictions and experimental confirmation.

[1]  Albert J. Vilella,et al.  A high-resolution map of human evolutionary constraint using 29 mammals , 2011, Nature.

[2]  Xiaoquan Wen,et al.  Cross-Population Joint Analysis of eQTLs: Fine Mapping and Functional Annotation , 2014, bioRxiv.

[3]  A. Butte,et al.  Extreme Evolutionary Disparities Seen in Positive Selection across Seven Complex Diseases , 2010, PloS one.

[4]  Chris S Haley,et al.  The genomic signature of trait-associated variants , 2013, BMC Genomics.

[5]  A. McKenna,et al.  CRISPR/Cas9-Mediated Scanning for Regulatory Elements Required for HPRT1 Expression via Thousands of Large, Programmed Genomic Deletions. , 2017, American journal of human genetics.

[6]  Andres Metspalu,et al.  Constraints on eQTL Fine Mapping in the Presence of Multisite Local Regulation of Gene Expression , 2016, G3: Genes, Genomes, Genetics.

[7]  D. Karolchik,et al.  The UCSC Genome Browser database: 2016 update , 2015, bioRxiv.

[8]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[9]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[10]  Sudhir Kumar,et al.  Patterns of transitional mutation biases within and among mammalian genomes. , 2003, Molecular biology and evolution.

[11]  J. Gelernter,et al.  Widespread signatures of positive selection in common risk alleles associated to autism spectrum disorder , 2017, PLoS genetics.

[12]  F. Dudbridge Power and Predictive Accuracy of Polygenic Risk Scores , 2013, PLoS genetics.

[13]  Sudhir Kumar,et al.  TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. , 2017, Molecular biology and evolution.

[14]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..

[15]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[16]  Daniel J. Blankenberg,et al.  28-way vertebrate alignment and conservation track in the UCSC Genome Browser. , 2007, Genome research.

[17]  Eleazar Eskin,et al.  Identification of causal genes for complex traits , 2015, Bioinform..

[18]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[19]  Chun Jimmie Ye,et al.  Intersection of population variation and autoimmunity genetics in human T cell activation , 2014, Science.

[20]  Sudhir Kumar,et al.  Positional conservation and amino acids shape the correct diagnosis and population frequencies of benign and damaging personal amino acid mutations. , 2009, Genome research.

[21]  Eric S. Lander,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[22]  R. Andrews,et al.  Innate Immune Activity Conditions the Effect of Regulatory Variants upon Monocyte Gene Expression , 2014, Science.

[23]  Sudhir Kumar,et al.  Mutation rates in mammalian genomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Enrico Petretto,et al.  Expression QTLs Mapping and Analysis: A Bayesian Perspective. , 2017, Methods in molecular biology.

[25]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[26]  E. Zeggini,et al.  Functional annotation of non-coding sequence variants , 2014, Nature Methods.

[27]  William W. Greenwald,et al.  Efficient Prioritization of Multiple Causal eQTL Variants via Sparse Polygenic Modeling , 2017, Genetics.

[28]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[29]  Lan T M Dao,et al.  Genome-wide characterization of mammalian promoters with distal enhancer functions , 2017, Nature Genetics.

[30]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[31]  Levi C. T. Pierce,et al.  Deep sequencing of 10,000 human genomes , 2016, Proceedings of the National Academy of Sciences.

[32]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[33]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[34]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[35]  Arcadi Navarro,et al.  Great ape genetic diversity and population history , 2013, Nature.

[36]  Hongyu Zhao,et al.  GenoWAP: GWAS signal prioritization through integrated analysis of genomic functional annotation , 2016, Bioinform..

[37]  A. Siepel,et al.  Probabilities of Fitness Consequences for Point Mutations Across the Human Genome , 2014, Nature Genetics.

[38]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[39]  Li Liu,et al.  Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants. , 2013, Molecular biology and evolution.

[40]  A. Boyle,et al.  Mining the Unknown: Assigning Function to Noncoding Single Nucleotide Polymorphisms. , 2017, Trends in genetics : TIG.

[41]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[42]  Eric Haugen,et al.  Large-scale identification of sequence variants impacting human transcription factor occupancy in vivo , 2015, Nature Genetics.

[43]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[44]  Xiang Zhou,et al.  A scalable Bayesian method for integrating functional information in genome-wide association studies , 2017, bioRxiv.

[45]  X. Wen,et al.  Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization , 2016, bioRxiv.

[46]  Mark I McCarthy,et al.  Genomic inflation factors under polygenic inheritance , 2011, European Journal of Human Genetics.

[47]  R. Hudson,et al.  A population genetic interpretation of GWAS findings for human quantitative traits , 2018, PLoS biology.

[48]  Sudhir Kumar,et al.  Evolutionary Diagnosis of non-synonymous variants involved in differential drug response , 2015, BMC Medical Genomics.

[49]  Justin C. Fay,et al.  Disease consequences of human adaptation☆ , 2013, Applied & translational genomics.

[50]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[51]  Feng Xu,et al.  Predicting regulatory variants with composite statistic , 2016, Bioinform..

[52]  R. Hudson,et al.  A model for the genetic architecture of quantitative traits under stabilizing selection , 2017, 1704.06707.

[53]  Daniel J Schaid,et al.  Incorporating Functional Annotations for Fine-Mapping Causal Variants in a Bayesian Framework Using Summary Statistics , 2016, Genetics.

[54]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[55]  J. T. Erichsen,et al.  Enhancer Evolution across 20 Mammalian Species , 2015, Cell.

[56]  Guy Sella,et al.  Pervasive Hitchhiking at Coding and Regulatory Sites in Humans , 2009, PLoS genetics.

[57]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[58]  E. Eskin,et al.  Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies , 2014, PLoS genetics.

[59]  Robert Plomin,et al.  Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence , 2017, Nature Genetics.

[60]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[61]  Buhm Han,et al.  Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci , 2014, bioRxiv.

[62]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[63]  Stephanie J. Spielman,et al.  Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies , 2015, bioRxiv.

[64]  William Maixner,et al.  Effect of Human Genetic Variability on Gene Expression in Dorsal Root Ganglia and Association with Pain Phenotypes. , 2017, Cell reports.

[65]  Richard Leslie,et al.  GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database , 2014, Bioinform..

[66]  Pardis C Sabeti,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[67]  J. Stinchcombe,et al.  Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression , 2015, Proceedings of the National Academy of Sciences.

[68]  Vanessa E. Gray,et al.  Evolutionary diagnosis method for variants in personal exomes , 2012, Nature Methods.

[69]  Elvira Bramon,et al.  The contribution of rare variants to risk of schizophrenia in individuals with and without intellectual disability , 2017, Nature Genetics.

[70]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[71]  K. Tamura,et al.  Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C-content biases. , 1992, Molecular biology and evolution.

[72]  Qi Long,et al.  Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic , 2016, Biometrics.

[73]  Eric S. Lander,et al.  Comprehensive population-based genome sequencing provides insight into hematopoietic regulatory mechanisms , 2016, Proceedings of the National Academy of Sciences.

[74]  Luke R. Lloyd-Jones,et al.  Signatures of negative selection in the genetic architecture of human complex traits , 2018, Nature Genetics.

[75]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[76]  Howard Y. Chang,et al.  Discovery of stimulation-responsive immune enhancers with CRISPR activation , 2017, Nature.

[77]  Manik Kuchroo,et al.  Common risk alleles for inflammatory diseases are targets of recent positive selection. , 2013, American journal of human genetics.