Single-sequence protein structure prediction using a language model and deep learning

[1]  O. S.,et al.  Accurate prediction of protein structures and interactions using a three-track neural network , 2022, Yearbook of Paediatric Endocrinology.

[2]  S. Ovchinnikov,et al.  State-of-the-art estimation of protein model accuracy using AlphaFold , 2022, bioRxiv.

[3]  Shaun M. Kandathil,et al.  Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins , 2022, Proceedings of the National Academy of Sciences.

[4]  OUP accepted manuscript , 2022, Bioinformatics.

[5]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[6]  Tom Sercu,et al.  Language models enable zero-shot prediction of the effects of mutations on protein function , 2021, bioRxiv.

[7]  Mohammed AlQuraishi,et al.  Machine learning in protein structure prediction. , 2021, Current opinion in chemical biology.

[8]  Florian Matthes,et al.  CodeTrans: Towards Cracking the Language of Silicone's Code Through Self-Supervised Deep Learning and High Performance Computing , 2021, ArXiv.

[9]  David S. Goodsell,et al.  RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences , 2020, Nucleic Acids Res..

[10]  Jinbo Xu,et al.  Improved protein structure prediction by deep learning irrespective of co-evolution information , 2020, Nature Machine Intelligence.

[11]  Myle Ott,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.

[12]  Llion Jones,et al.  ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning , 2021 .

[13]  Shaun M. Kandathil,et al.  Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterised proteins , 2021 .

[14]  Tom Sercu,et al.  Transformer protein language models are unsupervised structure learners , 2020, bioRxiv.

[15]  Jin-Rong Xu,et al.  An orphan protein of Fusarium graminearum modulates host immunity by mediating proteasomal degradation of TaSnRK1α , 2020, Nature Communications.

[16]  Nikhil Naik,et al.  ProGen: Language Modeling for Protein Generation , 2020, bioRxiv.

[17]  Yang Zhang,et al.  EvoEF2: accurate and fast energy function for computational protein design , 2019, Bioinform..

[18]  Carly K. Schissel,et al.  Synthesis of proteins by automated flow chemistry , 2020, Science.

[19]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[20]  David T. Jones,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[21]  Jianyi Yang,et al.  Improved protein structure prediction using predicted interresidue orientations , 2019, Proceedings of the National Academy of Sciences.

[22]  Jin Li,et al.  Universal Transforming Geometric Network , 2019, ArXiv.

[23]  Mohammed AlQuraishi,et al.  ProteinNet: a standardized data set for machine learning of protein structure , 2019, BMC Bioinformatics.

[24]  Burkhard Rost,et al.  Modeling the language of life – Deep Learning Protein Sequences , 2019, bioRxiv.

[25]  George M. Church,et al.  Unified rational protein engineering with sequence-only deep representation learning , 2019, bioRxiv.

[26]  Debora S. Marks,et al.  Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Mohammed AlQuraishi,et al.  End-to-end differentiable learning of protein structure , 2018, bioRxiv.

[29]  John C. Earls,et al.  A wellness study of 108 individuals using personal, dense, dynamic data clouds , 2017, Nature Biotechnology.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Roland L. Dunbrack,et al.  The Rosetta all-atom energy function for macromolecular modeling and design , 2017, bioRxiv.

[32]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[33]  Gustavo Glusman,et al.  Genomic architecture of inflammatory bowel disease in five families with multiple affected individuals , 2016, Human Genome Variation.

[34]  B. Rost,et al.  Unexpected features of the dark proteome , 2015, Proceedings of the National Academy of Sciences.

[35]  F. J. Ruiz-Dueñas,et al.  Improving the pH-stability of Versatile Peroxidase by Comparative Structural Analysis with a Naturally-Stable Manganese Peroxidase , 2015, PloS one.

[36]  Yang Zhang,et al.  I-TASSER server: new development for protein structure and function predictions , 2015, Nucleic Acids Res..

[37]  Frances H Arnold,et al.  Expanding the enzyme universe: accessing non-natural reactions by mechanism-guided directed evolution. , 2015, Angewandte Chemie.

[38]  Gert Vriend,et al.  A series of PDB related databases for everyday needs , 2010, Nucleic Acids Res..

[39]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[40]  William R Pearson,et al.  An Introduction to Sequence Similarity (“Homology”) Searching , 2013, Current protocols in bioinformatics.

[41]  Young Je Yoo,et al.  Prediction of the solvent affecting site and the computational design of stable Candida antarctica lipase B in a hydrophilic organic solvent. , 2013, Journal of biotechnology.

[42]  Kerstin Steiner,et al.  Recent advances in rational approaches for enzyme engineering , 2012, Computational and structural biotechnology journal.

[43]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[44]  Yang Zhang,et al.  Improving the physical realism and structural accuracy of protein models by a two-step atomic-level energy minimization. , 2011, Biophysical journal.

[45]  D. Tautz,et al.  The evolutionary origin of orphan genes , 2011, Nature Reviews Genetics.

[46]  Jens Meiler,et al.  RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite , 2011, PloS one.

[47]  David Baker,et al.  De Novo Enzyme Design Using Rosetta3 , 2011, PloS one.

[48]  Martin Lundgren,et al.  Discrete Frenet frame, inflection point solitons, and curve visualization with applications to folded proteins. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Carsten Wiuf,et al.  Fatgraph models of proteins , 2009, 0902.1025.

[50]  Eric A. Althoff,et al.  De Novo Computational Design of Retro-Aldol Enzymes , 2008, Science.

[51]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[52]  P. Kollman,et al.  Automatic atom type and bond type perception in molecular mechanical calculations. , 2006, Journal of molecular graphics & modelling.

[53]  Rolf Apweiler,et al.  UniProt archive , 2004, Bioinform..

[54]  C. Anfinsen,et al.  The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. , 1961, Proceedings of the National Academy of Sciences of the United States of America.