Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan

The recent technological advances underlying the screening of large combinatorial libraries in high-throughput mutational scans, deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype-fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold.

[1]  O. Rivoire,et al.  Hierarchy and extremes in selections from pools of randomized proteins , 2015, Proceedings of the National Academy of Sciences.

[2]  Alexandre V. Morozov,et al.  Biophysical Fitness Landscapes for Transcription Factor Binding Sites , 2013, PLoS Comput. Biol..

[3]  Dmitry Chudakov,et al.  Local fitness landscape of the green fluorescent protein , 2016, Nature.

[4]  David M. McCandlish,et al.  Annual Review of Genomics and Human Genetics Massively Parallel Assays and Quantitative Sequence – Function Relationships , 2019 .

[5]  Michael J. Berry,et al.  Weak pairwise correlations imply strongly correlated network states in a neural population , 2005, Nature.

[6]  Vitor B. Pinheiro,et al.  Selection platforms for directed evolution in synthetic biology , 2016, Biochemical Society transactions.

[7]  J. Herskowitz,et al.  Proceedings of the National Academy of Sciences, USA , 1996, Current Biology.

[8]  Javier Viña‐Gonzalez,et al.  Beyond the outer limits of nature by directed evolution. , 2016, Biotechnology advances.

[9]  M. Weigt,et al.  Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1 , 2015, bioRxiv.

[10]  Philip A. Romero,et al.  Exploring protein fitness landscapes by directed evolution , 2009, Nature Reviews Molecular Cell Biology.

[11]  S. Fields,et al.  A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function , 2012, Proceedings of the National Academy of Sciences.

[12]  John A. Robinson,et al.  ANNUAL REVIEW OF IMMUNOLOGY 1998 , 1998 .

[13]  D. Bolon,et al.  Systematic Mutant Analyses Elucidate General and Client-Specific Aspects of Hsp90 Function. , 2016, Cell reports.

[14]  Benjamin P. Roscoe,et al.  Systematic exploration of ubiquitin sequence, E1 activation efficiency, and experimental fitness in yeast. , 2014, Journal of molecular biology.

[15]  Elena R. Lozovsky,et al.  Biophysical principles predict fitness landscapes of drug resistance , 2016, Proceedings of the National Academy of Sciences.

[16]  J. Plotkin,et al.  Inferring the shape of global epistasis , 2018, Proceedings of the National Academy of Sciences.

[17]  T. Creighton Methods in Enzymology , 1968, The Yale Journal of Biology and Medicine.

[18]  Zachary Wu,et al.  Machine learning-assisted directed protein evolution with combinatorial libraries , 2019, Proceedings of the National Academy of Sciences.

[19]  James O Lloyd-Smith,et al.  Adaptation in protein fitness landscapes is facilitated by indirect paths , 2016, bioRxiv.

[20]  Andrea Pagnani,et al.  Maximum-Entropy Models of Sequenced Immune Repertoires Predict Antigen-Antibody Affinity , 2016, PLoS Comput. Biol..

[21]  W. P. Russ,et al.  Evolutionary information for specifying a protein fold , 2005, Nature.

[22]  Ben Lehner,et al.  The Causes and Consequences of Genetic Interactions (Epistasis). , 2019, Annual review of genomics and human genetics.

[23]  M. Reetz,et al.  Biocatalysis in organic chemistry and biotechnology: past, present, and future. , 2013, Journal of the American Chemical Society.

[24]  S. Withers,et al.  Ultrahigh‐Throughput FACS‐Based Screening for Directed Enzyme Evolution , 2009, Chembiochem : a European journal of chemical biology.

[25]  Debora S Marks,et al.  Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.

[26]  Nature Genetics , 1991, Nature.

[27]  John P. Barton,et al.  The Fitness Landscape of HIV-1 Gag: Advanced Modeling Approaches and Validation of Model Predictions by In Vitro Testing , 2014, PLoS Comput. Biol..

[28]  Jakub Otwinowski,et al.  Biophysical Inference of Epistasis and the Effects of Mutations on Protein Stability and Function , 2018, Molecular biology and evolution.

[29]  Matthew R. McKay,et al.  Fitness landscape of the human immunodeficiency virus envelope protein that is targeted by antibodies , 2018, Proceedings of the National Academy of Sciences.

[30]  Simona Cocco,et al.  Learning protein constitutive motifs from sequence data , 2018, eLife.

[31]  D. Baker,et al.  High Resolution Mapping of Protein Sequence–Function Relationships , 2010, Nature Methods.

[32]  Terence P. Speed,et al.  Enrich2: a statistical framework for analyzing deep mutational scanning data , 2016, bioRxiv.

[33]  C. Wilke,et al.  Biophysical models of protein evolution: Understanding the patterns of evolutionary sequence divergence , 2016, bioRxiv.

[34]  D. Rubinsztein Annual Review of Genomics and Human Genetics , 2001 .

[35]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[36]  Victor H Hernandez,et al.  Nature Methods , 2007 .

[37]  Andrew D Griffiths,et al.  High-throughput screens and selections of enzyme-encoding genes. , 2005, Current opinion in chemical biology.

[38]  Martin Weigt,et al.  How Pairwise Coevolutionary Models Capture the Collective Residue Variability in Proteins? , 2018, Molecular biology and evolution.

[39]  Nicholas C. Wu,et al.  A Comprehensive Biophysical Description of Pairwise Epistasis throughout an Entire Protein Domain , 2014, Current Biology.

[40]  A. Solow,et al.  Measuring biological diversity , 2006, Environmental and Ecological Statistics.

[41]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[42]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[43]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[44]  G. Winter,et al.  Making antibodies by phage display technology. , 1994, Annual review of immunology.

[45]  S. Fields,et al.  Deep mutational scanning: a new style of protein science , 2014, Nature Methods.