MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect

Multiplex Assays of Variant Effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here we present MaveDB, a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first of these applications, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.

[1]  Vanessa E. Gray,et al.  Analysis of Large-Scale Mutagenesis Data To Assess the Impact of Single Amino Acid Substitutions , 2017, Genetics.

[2]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[3]  Trevor Bedford,et al.  Deep mutational scanning of hemagglutinin helps predict evolutionary fates of human H3N2 influenza variants , 2018 .

[4]  Roberto A Chica,et al.  ProtaBank: A repository for protein design and engineering data , 2018, bioRxiv.

[5]  Debora S. Marks,et al.  3D protein structure from genetic epistasis experiments , 2018, bioRxiv.

[6]  Terence P. Speed,et al.  Enrich2: a statistical framework for analyzing deep mutational scanning data , 2016, bioRxiv.

[7]  Inês Barroso,et al.  Prospective functional classification of all possible missense variants in PPARG , 2016, Nature Genetics.

[8]  Douglas E. V. Pires,et al.  Platinum: a database of experimentally measured effects of mutations on structurally defined protein–ligand complexes , 2014, Nucleic Acids Res..

[9]  Atina G. Coté,et al.  A framework for exhaustively mapping functional missense variants , 2017, Molecular systems biology.

[10]  Magnus Ingelman-Sundberg,et al.  CYPalleles: a web page for nomenclature of human cytochrome P450 alleles. , 2002, Drug metabolism and pharmacokinetics.

[11]  Fabian Sievers,et al.  Clustal Omega, accurate alignment of very large numbers of sequences. , 2014, Methods in molecular biology.

[12]  Neil A. Miller,et al.  The Pharmacogene Variation (PharmVar) Consortium: Incorporation of the Human Cytochrome P450 (CYP) Allele Nomenclature Database , 2017, Clinical pharmacology and therapeutics.

[13]  John Kuriyan,et al.  Deep mutational analysis reveals functional trade-offs in the sequences of EGFR autophosphorylation sites , 2018, Proceedings of the National Academy of Sciences.

[14]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[15]  Jeroen F. J. Laros,et al.  LOVD v.2.0: the next generation in gene variant databases , 2011, Human mutation.

[16]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[17]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[18]  Angus M. Sidore,et al.  Multiplexed gene synthesis in emulsions for exploring protein functional landscapes , 2017, Science.

[19]  S. Fields,et al.  Deep mutational scanning: a new style of protein science , 2014, Nature Methods.

[20]  Nadav Ahituv,et al.  Gene Regulatory Elements, Major Drivers of Human Disease. , 2017, Annual review of genomics and human genetics.

[21]  J. Shendure,et al.  The power of multiplexed functional analysis of genetic variants , 2016, Nature Protocols.

[22]  David James Russell,et al.  Multiple sequence alignment methods , 2006 .

[23]  David L. Young,et al.  Massively Parallel Functional Analysis of BRCA1 RING Domain Variants , 2015, Genetics.

[24]  Vanessa E. Gray,et al.  Multiplex Assessment of Protein Variant Abundance by Massively Parallel Sequencing , 2018, Nature Genetics.

[25]  Dmitriy Sonkin,et al.  TP53 Variations in Human Cancers: New Lessons from the IARC TP53 Database and Genomics Data , 2016, Human mutation.

[26]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[27]  Claudia Bank,et al.  A Balance between Inhibitor Binding and Substrate Processing Confers Influenza Drug Resistance. , 2016, Journal of molecular biology.

[28]  Jay Shendure,et al.  Saturation mutagenesis of disease-associated regulatory elements , 2018 .

[29]  Trevor Bedford,et al.  Deep mutational scanning of hemagglutinin helps predict evolutionary fates of human H3N2 influenza variants , 2018, Proceedings of the National Academy of Sciences.

[30]  Simon Mitternacht,et al.  FreeSASA: An open source C library for solvent accessible surface area calculations , 2016, F1000Research.

[31]  Douglas M. Fowler,et al.  Enrich: software for analysis of protein function by enrichment and depletion of variants , 2011, Bioinform..

[32]  Raymond Dalgleish,et al.  HGVS Recommendations for the Description of Sequence Variants: 2016 Update , 2016, Human mutation.

[33]  Maitreya J. Dunham,et al.  Variant Interpretation: Functional Assays to the Rescue. , 2017, American journal of human genetics.

[34]  R. Ranganathan,et al.  Evolvability as a Function of Purifying Selection in TEM-1 β-Lactamase , 2015, Cell.

[35]  Ben Lehner,et al.  Determining protein structures using genetics , 2018, bioRxiv.

[36]  Jay Shendure,et al.  Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data. , 2017, Cell systems.

[37]  Claudia Bank,et al.  A Statistical Guide to the Design of Deep Mutational Scanning Experiments , 2016, Genetics.

[38]  David L. Young,et al.  Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein , 2013, RNA.

[39]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[40]  Geoffrey J. Barton,et al.  Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation , 1993, Comput. Appl. Biosci..

[41]  Joseph D. Janizek,et al.  Accurate functional classification of thousands of BRCA1 variants with saturation genome editing , 2018, bioRxiv.

[42]  Joseph B Hiatt,et al.  Massively parallel functional dissection of mammalian enhancers in vivo , 2012, Nature Biotechnology.

[43]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[44]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[45]  Frederick P Roth,et al.  A web application and service for imputing and visualizing missense variant effect maps , 2019, Bioinform..

[46]  D. Baker,et al.  High Resolution Mapping of Protein Sequence–Function Relationships , 2010, Nature Methods.

[47]  Akinori Sarai,et al.  ProTherm, version 4.0: thermodynamic database for proteins and mutants , 2004, Nucleic Acids Res..

[48]  Jay Shendure,et al.  A multiplexed homology-directed DNA repair assay reveals the impact of ~1,700 BRCA1 variants on protein function , 2018, bioRxiv.

[49]  Dick Hardt,et al.  The OAuth 2.0 Authorization Framework , 2012, RFC.

[50]  Peter Saint-Andre,et al.  Uniform Resource Names (URNs) , 2017, RFC.

[51]  Benjamin P. Roscoe,et al.  Fitness analyses of all possible point mutations for regions of genes in yeast , 2012, Nature Protocols.

[52]  Prabath Siriwardena,et al.  OAuth 2.0 , 2014 .

[53]  Frederick P. Roth,et al.  Multiplexed assays of variant effects contribute to a growing genotype–phenotype atlas , 2018, Human Genetics.