Protein design and variant prediction using autoregressive generative models

[1]  Anna G. Green,et al.  Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences , 2021, Nature Communications.

[2]  Myle Ott,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.

[3]  Birch , 2020, The Long, Long Life of Trees.

[4]  Chang C. Liu,et al.  Rapid generation of potent antibodies by autonomous hypermutation in yeast , 2020, bioRxiv.

[5]  Serge Muyldermans,et al.  A guide to: generation and design of nanobodies , 2020, The FEBS journal.

[6]  N. Krogan,et al.  An ultra-potent synthetic nanobody neutralizes SARS-CoV-2 by locking Spike into an inactive conformation , 2020, bioRxiv.

[7]  Georg Seelig,et al.  A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences , 2020, Cell systems.

[8]  C. Deane,et al.  How repertoire data are changing antibody science , 2020, The Journal of Biological Chemistry.

[9]  Joseph A. Marsh,et al.  Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations , 2019, bioRxiv.

[10]  Anna G. Green,et al.  Proteome-scale discovery of protein interactions with residue-level resolution using sequence coevolution , 2019, bioRxiv.

[11]  Wojciech Samek,et al.  UDSMProt: universal deep sequence models for protein classification , 2019, bioRxiv.

[12]  D. Baker,et al.  Protein interaction networks revealed by proteome coevolution , 2019, Science.

[13]  Ziheng Wang,et al.  Antibody complementarity determining region design using high-capacity machine learning , 2019, bioRxiv.

[14]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[15]  A. Porrello,et al.  Intrinsically disordered proteins and structured proteins with intrinsically disordered regions have different functional roles in the cell , 2019, bioRxiv.

[16]  Y. Choong,et al.  Cognizance of Molecular Methods for the Generation of Mutagenic Phage Display Antibody Libraries for Affinity Maturation , 2019, International journal of molecular sciences.

[17]  Ekaterina V Putintseva,et al.  An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape , 2019, PLoS genetics.

[18]  Regina Barzilay,et al.  Generative Models for Graph-Based Protein Design , 2019, DGS@ICLR.

[19]  George M. Church,et al.  Unified rational protein engineering with sequence-only deep representation learning , 2019, bioRxiv.

[20]  Victor Greiff,et al.  Large-scale network analysis reveals the sequence space architecture of antibody repertoires , 2019, Nature Communications.

[21]  Jennifer Listgarten,et al.  Conditioning by adaptive sampling for robust design , 2019, ICML.

[22]  Arjun Ravikumar,et al.  Scalable, Continuous Evolution of Genes at Mutation Rates above Genomic Error Thresholds , 2018, Cell.

[23]  Debora S Marks,et al.  Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.

[24]  David T. Jones,et al.  Design of metalloproteins and novel protein folds using variational autoencoders , 2018, Scientific Reports.

[25]  Quentin Marcou,et al.  High-throughput immune repertoire analysis with IGoR , 2017, Nature Communications.

[26]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[27]  Taylor L. Mighell,et al.  A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype-phenotypes relationships , 2018, bioRxiv.

[28]  Conor McMahon,et al.  Yeast surface display platform for rapid discovery of conformationally selective nanobodies , 2018, Nature Structural & Molecular Biology.

[29]  Johannes Söding,et al.  Clustering huge protein sequence sets in linear time , 2017, Nature Communications.

[30]  Jay Shendure,et al.  Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data. , 2017, Cell systems.

[31]  Chunlei Liu,et al.  ClinVar: improving access to variant interpretations and supporting evidence , 2017, Nucleic Acids Res..

[32]  T. S. Lim,et al.  Naïve Human Antibody Libraries for Infectious Diseases , 2018, Advances in experimental medicine and biology.

[33]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[34]  Alexander M. Rush,et al.  Dilated Convolutions for Modeling Long-Distance Genomic Dependencies , 2017, bioRxiv.

[35]  E. Goldman,et al.  Evaluation of anti‐botulinum neurotoxin single domain antibodies with additional optimization for improved production and stability , 2017, Toxicon : official journal of the International Society on Toxinology.

[36]  Xinghua Shi,et al.  Effects of short indels on protein structure and function in human genomes , 2017, Scientific Reports.

[37]  Ryan L Kelly,et al.  Nonspecificity in a nonimmune human scFv repertoire , 2017, mAbs.

[38]  S. Loi,et al.  Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis. , 2017, The Lancet. Oncology.

[39]  M. Seeger,et al.  Synthetic single domain antibodies for the conformational trapping of membrane proteins , 2017, bioRxiv.

[40]  Icgc,et al.  Pan-cancer analysis of whole genomes , 2017, bioRxiv.

[41]  Tilman Flock,et al.  Exploiting sequence and stability information for directing nanobody stability engineering , 2017, Biochimica et biophysica acta. General subjects.

[42]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[43]  Samy Bengio,et al.  Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.

[44]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[45]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[46]  A. Pashov,et al.  Antibody repertoire profiling with mimotope arrays , 2017, Human vaccines & immunotherapeutics.

[47]  K Dane Wittrup,et al.  Biophysical properties of the clinical-stage antibody landscape , 2017, Proceedings of the National Academy of Sciences.

[48]  Alex Graves,et al.  Neural Machine Translation in Linear Time , 2016, ArXiv.

[49]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[50]  Susanne Müller,et al.  Generation and analyses of human synthetic antibody libraries and their application for protein microarrays. , 2016, Protein engineering, design & selection : PEDS.

[51]  D. Baker,et al.  The coming of age of de novo protein design , 2016, Nature.

[52]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[53]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[54]  Jeffrey J. Gray,et al.  Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires , 2016, Proceedings of the National Academy of Sciences.

[55]  Andrea Pagnani,et al.  Maximum-Entropy Models of Sequenced Immune Repertoires Predict Antigen-Antibody Affinity , 2016, PLoS Comput. Biol..

[56]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[57]  M. Weigt,et al.  Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1 , 2015, bioRxiv.

[58]  Karsten M. Borgwardt,et al.  The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity , 2015, Human mutation.

[59]  Kendrick B. Turner,et al.  Improving the biophysical properties of anti-ricin single-domain antibodies☆ , 2015, Biotechnology reports.

[60]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[61]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[62]  Robin A. Weiss,et al.  Molecular Evolution of Broadly Neutralizing Llama Antibodies to the CD4-Binding Site of HIV-1 , 2014, PLoS pathogens.

[63]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[64]  S. Fields,et al.  Deep mutational scanning: a new style of protein science , 2014, Nature Methods.

[65]  John P. Barton,et al.  The Fitness Landscape of HIV-1 Gag: Advanced Modeling Approaches and Validation of Model Predictions by In Vitro Testing , 2014, PLoS Comput. Biol..

[66]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[67]  Debora S. Marks,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, bioRxiv.

[68]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[69]  Christopher J. Oldfield,et al.  Classification of Intrinsically Disordered Regions and Proteins , 2014, Chemical reviews.

[70]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[71]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[72]  E. Goldman,et al.  Contributions of the Complementarity Determining Regions to the Thermal Stability of a Single-Domain Antibody , 2013, PloS one.

[73]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[74]  S. Kaveri,et al.  Antibody Polyreactivity in Health and Disease: Statu Variabilis , 2013, Journal of Immunology.

[75]  Serge Muyldermans,et al.  Nanobodies: natural single-domain antibodies. , 2013, Annual review of biochemistry.

[76]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[77]  George Georgiou,et al.  High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire , 2013, Nature Biotechnology.

[78]  I. Adzhubei,et al.  Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2 , 2013, Current protocols in human genetics.

[79]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[80]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[81]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[82]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[83]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[84]  J. Dimitrov,et al.  Antibody polyspecificity: what does it matter? , 2012, Advances in experimental medicine and biology.

[85]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[86]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[87]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[88]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[89]  Ryan E. Mills,et al.  Natural genetic variation caused by small insertions and deletions in the human genome. , 2011, Genome research.

[90]  John McCafferty,et al.  Beyond natural antibodies: the power of in vitro display technologies , 2011, Nature Biotechnology.

[91]  Ryan E. Mills,et al.  Small insertions and deletions (INDELs) in human genomes. , 2010, Human molecular genetics.

[92]  David Baker,et al.  An exciting but challenging road ahead for computational enzyme design , 2010, Protein science : a publication of the Protein Society.

[93]  R. Beerli,et al.  Mining human antibody repertoires , 2010, mAbs.

[94]  David T. Jones,et al.  Modularity of intrinsic disorder in the human proteome , 2010, Proteins.

[95]  W. Bialek,et al.  Maximum entropy models for antibody diversity , 2009, Proceedings of the National Academy of Sciences.

[96]  Philip A. Romero,et al.  Exploring protein fitness landscapes by directed evolution , 2009, Nature Reviews Molecular Cell Biology.

[97]  F. Arnold,et al.  Directed evolution: new parts and optimized function. , 2009, Current opinion in biotechnology.

[98]  A Keith Dunker,et al.  Unfoldomics of human diseases: linking protein intrinsic disorder with diseases , 2009, BMC Genomics.

[99]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[100]  M. Nussenzweig,et al.  Predominant Autoantibody Production by Early Human B Cell Precursors , 2003, Science.

[101]  V. Giudicelli,et al.  IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains. , 2003, Developmental and comparative immunology.

[102]  B. de Geus,et al.  Llama heavy-chain V regions consist of at least four distinct subfamilies revealing novel sequence features. , 2000, Molecular immunology.

[103]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[104]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[105]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[106]  D. Hochstrasser,et al.  The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences , 1993, Electrophoresis.

[107]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.