Embeddings from protein language models predict conservation and variant effects

[1]  B. Rost,et al.  Protein embeddings and deep learning predict binding residues for various ligand classes , 2021, Scientific Reports.

[2]  B. Rost,et al.  Protein language model embeddings for fast, accurate, alignment-free protein structure prediction , 2021, bioRxiv.

[3]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[4]  Tom Sercu,et al.  Language models enable zero-shot prediction of the effects of mutations on protein function , 2021, bioRxiv.

[5]  B. Berger,et al.  Learning the protein language: Evolution, structure, and function. , 2021, Cell systems.

[6]  N. Ben-Tal,et al.  Editorial overview: Sequences and topology: 'paths from sequence to structure'. , 2021, Current opinion in structural biology.

[7]  Kevin K. Yang,et al.  Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets , 2021, Current protocols.

[8]  B. Rost,et al.  Light attention predicts protein location from the language of life , 2021, bioRxiv.

[9]  Michal Linial,et al.  The language of proteins: NLP, machine learning & protein sequences , 2021, Computational and structural biotechnology journal.

[10]  B. Rost,et al.  PredictProtein - Predicting Protein Structure and Function for 29 Years , 2021, bioRxiv.

[11]  B. Rost,et al.  Clustering FunFams using sequence embeddings improves EC purity , 2021, bioRxiv.

[12]  R. Kolodny,et al.  Searching protein space for ancient sub-domain segments. , 2021, Current opinion in structural biology.

[13]  Peter B. McGarvey,et al.  UniProt: the universal protein knowledgebase in 2021 , 2020, Nucleic Acids Res..

[14]  Myle Ott,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.

[15]  Llion Jones,et al.  ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning , 2021 .

[16]  Tom Sercu,et al.  Transformer protein language models are unsupervised structure learners , 2020, bioRxiv.

[17]  Ewen Callaway,et al.  ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures , 2020, Nature.

[18]  Hemalatha Balaram,et al.  Toward Developing Intuitive Rules for Protein Variant Effect Prediction Using Deep Mutational Scanning Data , 2020, ACS omega.

[19]  Burkhard Rost,et al.  Embeddings from deep learning transfer GO annotations beyond homology , 2020, Scientific Reports.

[20]  Bosco K. Ho,et al.  Systematic modeling of SARS-CoV-2 protein structures , 2020 .

[21]  Bosco K. Ho,et al.  SARS-CoV-2 structural coverage map reveals state changes that disrupt host immunity , 2020, bioRxiv.

[22]  B. Rost,et al.  ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing , 2020, bioRxiv.

[23]  S. Rowland-Jones,et al.  Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus , 2020, Cell.

[24]  Raghunath Chatterjee,et al.  Characterizations of SARS-CoV-2 mutational profile, spike protein stability and viral transmission , 2020, bioRxiv.

[25]  F. Giorgi,et al.  Geographic and Genomic Distribution of SARS-CoV-2 Mutations , 2020, Frontiers in Microbiology.

[26]  Nikhil Naik,et al.  ProGen: Language Modeling for Protein Generation , 2020, bioRxiv.

[27]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[28]  Wei Wang,et al.  Mutation effect estimation on protein–protein interactions using deep contextualized representation learning , 2019, bioRxiv.

[29]  Burkhard Rost,et al.  Modeling aspects of the language of life through transfer-learning protein sequences , 2019, BMC Bioinformatics.

[30]  Burkhard Rost,et al.  Variant effect predictions capture some aspects of deep mutational scanning experiments , 2019, BMC Bioinformatics.

[31]  Joseph A. Marsh,et al.  Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations , 2019, bioRxiv.

[32]  N. Ben-Tal,et al.  ConSurf‐DB: An accessible repository for the evolutionary conservation patterns of the majority of PDB proteins , 2019, Protein science : a publication of the Protein Society.

[33]  Alan F. Rubin,et al.  MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect , 2019, Genome Biology.

[34]  George M. Church,et al.  Unified rational protein engineering with sequence-only deep representation learning , 2019, bioRxiv.

[35]  Bonnie Berger,et al.  Learning protein sequence embeddings using information from structure , 2019, ICLR.

[36]  A. Carbone,et al.  GEMME: A Simple and Fast Global Epistatic Model Predicting Mutational Effects , 2019, bioRxiv.

[37]  Alan F. Scott,et al.  OMIM.org: leveraging knowledge across phenotype–gene relationships , 2018, Nucleic Acids Res..

[38]  David S. Goodsell,et al.  RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy , 2018, Nucleic Acids Res..

[39]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[41]  Thomas A. Hopf,et al.  Evolutionary couplings and sequence variation effect predict protein binding sites , 2018, Proteins.

[42]  Debora S Marks,et al.  Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.

[43]  H. Hepach References in Figures , 2018 .

[44]  Frederick P. Roth,et al.  Multiplexed assays of variant effects contribute to a growing genotype–phenotype atlas , 2018, Human Genetics.

[45]  Joseph D. Janizek,et al.  Accurate classification of BRCA1 variants with saturation genome editing , 2018, Nature.

[46]  Vanessa E. Gray,et al.  Multiplex Assessment of Protein Variant Abundance by Massively Parallel Sequencing , 2018, Nature Genetics.

[47]  Johannes Söding,et al.  Clustering huge protein sequence sets in linear time , 2017, Nature Communications.

[48]  Jay Shendure,et al.  Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data. , 2017, Cell systems.

[49]  Debora S. Marks,et al.  Deep generative models of genetic variation capture mutation effects , 2017, bioRxiv.

[50]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[51]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[52]  Y. Bromberg,et al.  Computational predictors fail to identify amino acid substitution effects at rheostat positions , 2017, Scientific Reports.

[53]  Inês Barroso,et al.  Prospective functional classification of all possible missense variants in PPARG , 2016, Nature Genetics.

[54]  Burkhard Rost,et al.  Predicted Molecular Effects of Sequence Variants Link to System Level of Disease , 2016, PLoS Comput. Biol..

[55]  J. Söding,et al.  A vocabulary of ancient peptides at the origin of folded proteins , 2015, eLife.

[56]  Piero Fariselli,et al.  INPS: predicting the impact of non-synonymous variations on protein stability from sequence , 2015, Bioinform..

[57]  B. Rost,et al.  Better prediction of functional effects for sequence variants , 2015, BMC Genomics.

[58]  Karsten M. Borgwardt,et al.  The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity , 2015, Human mutation.

[59]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[60]  M. Vihinen,et al.  PON-P2: Prediction Method for Fast and Reliable Identification of Harmful Variants , 2015, PloS one.

[61]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[62]  O. Lichtarge,et al.  A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness , 2014, Genome research.

[63]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[64]  S. Fields,et al.  Deep mutational scanning: a new style of protein science , 2014, Nature Methods.

[65]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[66]  Yana Bromberg,et al.  News from the protein mutability landscape. , 2013, Journal of molecular biology.

[67]  S. Eddy,et al.  Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions , 2013, Nucleic acids research.

[68]  Benoit H. Dessailly,et al.  Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. , 2013, The Biochemical journal.

[69]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[70]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[71]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[72]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[73]  Jana Marie Schwarz,et al.  MutationTaster evaluates disease-causing potential of sequence alterations , 2010, Nature Methods.

[74]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[75]  Burkhard Rost,et al.  Correlating protein function and stability through the analysis of single amino acid substitutions , 2009, BMC Bioinformatics.

[76]  Burkhard Rost,et al.  Comprehensive in silico mutagenesis highlights functionally important residues in proteins , 2008, ECCB.

[77]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[78]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[79]  Piero Fariselli,et al.  ConSeq: the identification of functionally and structurally important residues in protein sequences , 2004, Bioinform..

[80]  Burkhard Rost,et al.  CHOP proteins into structural domain‐like fragments , 2004, Proteins.

[81]  Maria Nikolopoulou,et al.  Evaluation of Tools , 2004 .

[82]  B. Rost,et al.  Sequence-based prediction of protein domains. , 2004, Nucleic acids research.

[83]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[84]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[85]  Burkhard Rost,et al.  Domains, motifs and clusters in the protein universe. , 2003, Current opinion in chemical biology.

[86]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[87]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[88]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[89]  Motonori Ota,et al.  The Protein Mutant Database , 1999, Nucleic Acids Res..

[90]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[91]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[92]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[93]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[94]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[95]  K. Nishikawa,et al.  Constructing a protein mutant database. , 1994, Protein engineering.

[96]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[97]  Chris Sander,et al.  Jury returns on structure prediction , 1992, Nature.

[98]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[99]  Zengo Furukawa,et al.  A General Framework for , 1991 .

[100]  K. Nagai,et al.  Coordinated amino acid changes in homologous protein families. , 1988, Protein engineering.

[101]  KUNIHIKO FUKUSHIMA,et al.  Visual Feature Extraction by a Multilayered Network of Analog Threshold Elements , 1969, IEEE Trans. Syst. Sci. Cybern..