Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
暂无分享,去创建一个
Myle Ott | R. Fergus | C. L. Zitnick | Alexander Rives | Siddharth Goyal | J. Meier | Demi Guo | Jerry Ma | Joshua Meier
[1] Zellig S. Harris,et al. Distributional Structure , 1954 .
[2] C. Yanofsky,et al. Protein Structure Relationships Revealed by Mutational Analysis , 1964, Science.
[3] M. Levitt. Conformational preferences of amino acids in globular proteins. , 1978, Biochemistry.
[4] Zellig S. Harris,et al. Distributional Structure , 1954 .
[5] W. Kabsch,et al. Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.
[6] A. Lesk,et al. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.
[7] K. Nagai,et al. Coordinated amino acid changes in homologous protein families. , 1988, Protein engineering.
[8] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.
[9] C. Sander,et al. Correlated Mutations and Residue Contacts , 1994 .
[10] C. Sander,et al. Correlated mutations and residue contacts in proteins , 1994, Proteins.
[11] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[12] David C. Jones,et al. CATH--a hierarchic classification of protein domain structures. , 1997, Structure.
[13] G. Stormo,et al. Correlated mutations in protein sequences: Phylogenetic and structural effects , 1997 .
[14] Sean R. Eddy,et al. Profile hidden Markov models , 1998, Bioinform..
[15] M. Cosgrove,et al. On the mechanism of the reaction catalyzed by glucose 6-phosphate dehydrogenase. , 1998, Biochemistry.
[16] S F Altschul,et al. Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. , 1998, Trends in biochemical sciences.
[17] B. Kräutler,et al. Structure and dynamics of the B12-binding subunit of glutamate mutase from Clostridium cochlearium. , 1999, European journal of biochemistry.
[18] G. Stormo,et al. Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .
[19] G J Barton,et al. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.
[20] D T Jones,et al. Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.
[21] T. Mizuno,et al. Structure of the histidine-containing phosphotransfer (HPt) domain of the anaerobic sensor protein ArcB complexed with the chemotaxis response regulator CheY. , 1999, Acta crystallographica. Section D, Biological crystallography.
[22] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[23] B. Seaton,et al. The crystal structure of MarR, a regulator of multiple antibiotic resistance, at 2.3 Å resolution , 2001, Nature Structural Biology.
[24] J Overbaugh,et al. Selection Forces and Constraints on Retroviral Sequence Variation , 2001, Science.
[25] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[26] Peer Bork,et al. Impact of selection, mutation rate and genetic drift on human genetic variation. , 2003, Human molecular genetics.
[27] Adam Zemla,et al. Critical assessment of methods of protein structure prediction (CASP)‐round V , 2005, Proteins.
[28] Cathy H. Wu,et al. The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..
[29] D. Kihara. The effect of long‐range interactions on the secondary structure formation of proteins , 2005, Protein science : a publication of the Protein Society.
[30] Johannes Söding,et al. Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..
[31] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[32] Alexander Zien,et al. Semi-Supervised Learning , 2006 .
[33] Roland L. Dunbrack. Sequence comparison and protein structure prediction. , 2006, Current opinion in structural biology.
[34] Peter B. McGarvey,et al. UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..
[35] T. Gabaldón. Evolution of proteins and proteomes: a phylogenetics approach , 2005, Evolutionary bioinformatics online.
[36] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[37] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[38] E. Birney,et al. Pfam: the protein families database , 2013, Nucleic Acids Res..
[39] Chris Bailey-Kellogg,et al. Graphical Models of Residue Coupling in Protein Families , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[40] Geoffrey E. Hinton. Reducing the Dimensionality of Data with Neural , 2008 .
[41] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[42] Philip A. Romero,et al. Exploring protein fitness landscapes by directed evolution , 2009, Nature Reviews Molecular Cell Biology.
[43] T. Hwa,et al. Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.
[44] S. Henikoff,et al. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.
[45] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[46] Tim A. H. te Beek,et al. A series of PDB related databases for everyday needs , 2010, Nucleic Acids Res..
[47] Ilya Sutskever,et al. SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .
[48] Thomas A. Hopf,et al. Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.
[49] Johannes Söding,et al. Protein sequence comparison and fold recognition: progress and good-practice benchmarking. , 2011, Current opinion in structural biology.
[50] Sivaraman Balakrishnan,et al. Learning generative models for protein fold families , 2011, Proteins.
[51] A. Tramontano,et al. Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.
[52] C. Sander,et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.
[53] A. Biegert,et al. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.
[54] Massimiliano Pontil,et al. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..
[55] E. Aurell,et al. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.
[56] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[57] Sahand Hormoz,et al. Amino acid composition of proteins reduces deleterious impact of mutations , 2013, Scientific Reports.
[58] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[59] I. Adzhubei,et al. Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2 , 2013, Current protocols in human genetics.
[60] Markus Gruber,et al. CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..
[61] S. Fields,et al. Deep mutational scanning: a new style of protein science , 2014, Nature Methods.
[62] Steven E. Brenner,et al. SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..
[63] Zhiyong Wang,et al. MRFalign: Protein Homology Detection through Alignment of Markov Random Fields , 2014, PLoS Comput. Biol..
[64] B. Schulz,et al. Sequence-based protein stabilization in the absence of glycosylation , 2014, Nature Communications.
[65] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[66] Jian Zhou,et al. Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.
[67] Sheng Wang,et al. Protein Homology Detection Through Alignment of Markov Random Fields , 2015, SpringerBriefs in Computer Science.
[68] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[69] B. Rost,et al. Better prediction of functional effects for sequence variants , 2015, BMC Genomics.
[70] Peter B. McGarvey,et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..
[71] Debora S. Marks,et al. Quantification of the effect of mutations using a global probability model of natural sequence variation , 2015, 1510.04612.
[72] Andrew J. Hill,et al. Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.
[73] A. Tramontano,et al. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI , 2016, Proteins.
[74] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[75] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[76] David A. Scott,et al. Rationally engineered Cas9 nucleases with improved specificity , 2015, Science.
[77] James Y. Zou. Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.
[78] M. Weigt,et al. Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1 , 2015, bioRxiv.
[79] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[80] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.
[81] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[82] Jian Peng,et al. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.
[83] Jinbo Xu,et al. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016 .
[84] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[85] Piotr,et al. UNSUPERVISED MACHINE TRANSLATION USING MONOLINGUAL CORPORA ONLY , 2017 .
[86] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[87] Georgios A. Pavlopoulos,et al. Protein structure determination using metagenome sequence data , 2017, Science.
[88] Johannes Söding,et al. MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.
[89] David R. Liu,et al. Phage-assisted continuous evolution of proteases with altered substrate specificity , 2017, Nature Communications.
[90] David Baker,et al. Origins of coevolution between residues distant in protein 3D structures , 2017, Proceedings of the National Academy of Sciences.
[91] Zhen Li,et al. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.
[92] Maria Jesus Martin,et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..
[93] David T. Jones,et al. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..
[94] David R. Liu,et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity , 2018, Nature.
[95] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[96] Jay Shendure,et al. Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data. , 2017, Cell systems.
[97] A. Tramontano,et al. Critical assessment of methods of protein structure prediction (CASP)—Round XII , 2018, Proteins.
[98] Ole Winther,et al. NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning , 2018, bioRxiv.
[99] Kaveri A. Thakoor,et al. High Quality Prediction of Protein Q8 Secondary Structure by Diverse Neural Network Architectures , 2018, ArXiv.
[100] P. Donnelly,et al. The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.
[101] Zachary Wu,et al. Learned protein embeddings for machine learning , 2018, Bioinformatics.
[102] Jie Hou,et al. DeepSF: deep convolutional neural network for mapping protein sequences to folds , 2017, Bioinform..
[103] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[104] Debora S Marks,et al. Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.
[105] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.
[106] Daniel Jurafsky,et al. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.
[107] Jinbo Xu. Distance-based protein folding powered by deep learning , 2018, Proceedings of the National Academy of Sciences.
[108] Burkhard Rost,et al. Modeling aspects of the language of life through transfer-learning protein sequences , 2019, BMC Bioinformatics.
[109] Ole Winther,et al. NetSurfP‐2.0: Improved prediction of protein structural features by integrated deep learning , 2019, Proteins.
[110] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[111] K. Persaud,et al. Effects of point mutations in the binding pocket of the mouse major urinary protein MUP20 on ligand affinity and specificity , 2019, Scientific Reports.
[112] Bonnie Berger,et al. Learning protein sequence embeddings using information from structure , 2019, ICLR.
[113] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.
[114] Ryan L. Collins,et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes , 2019, bioRxiv.
[115] Kevin K. Yang,et al. Machine-learning-guided directed evolution for protein engineering , 2018, Nature Methods.
[116] Aleksej Zelezniak,et al. Expanding functional protein sequence space using generative adversarial networks , 2019, bioRxiv.
[117] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[118] George M. Church,et al. Unified rational protein engineering with sequence-based deep representation learning , 2019, Nature Methods.
[119] Burkhard Rost,et al. End-to-end multitask learning, from protein language to protein features without alignments , 2019, bioRxiv.
[120] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[121] Johannes Söding,et al. HH-suite3 for fast remote homology detection and deep protein annotation , 2019, BMC Bioinformatics.
[122] Debora S. Marks,et al. Accelerating Protein Design Using Autoregressive Generative Models , 2019, bioRxiv.
[123] Luke S. Zettlemoyer,et al. Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.
[124] Gregory M. Cooper,et al. CADD: predicting the deleteriousness of variants throughout the human genome , 2018, Nucleic Acids Res..
[125] Ned S Wingreen,et al. Revealing evolutionary constraints on proteins through sequence analysis , 2018, bioRxiv.
[126] Davide Heller,et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses , 2018, Nucleic Acids Res..
[127] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[128] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[129] Alex Wang,et al. BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model , 2019, Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation.
[130] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[131] Badri Adhikari. DEEPCON: Protein Contact Prediction using Dilated Convolutional Neural Networks with Dropout , 2019 .
[132] John Canny,et al. Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.
[133] Torsten Schwede,et al. Critical assessment of methods of protein structure prediction (CASP)—Round XIII , 2019, Proteins.
[134] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[135] Jinbo Xu. Distance-based protein folding powered by deep learning , 2019, Proceedings of the National Academy of Sciences.
[136] Badri Adhikari,et al. DEEPCON: Protein Contact Prediction using Dilated Convolutional Neural Networks with Dropout , 2019, bioRxiv.
[137] Demis Hassabis,et al. Improved protein structure prediction using potentials from deep learning , 2020, Nature.
[138] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[139] Yang Liu,et al. Evolutionary context-integrated deep sequence modeling for protein engineering , 2020, bioRxiv.
[140] Wojciech Samek,et al. UDSMProt: universal deep sequence models for protein classification , 2020, Bioinformatics.
[141] Tileli Amimeur,et al. Designing Feature-Controlled Humanoid Antibody Discovery Libraries Using Generative Adversarial Networks , 2020, bioRxiv.
[142] Nikhil Naik,et al. ProGen: Language Modeling for Protein Generation , 2020, bioRxiv.
[143] Alex Hawkins-Hooker,et al. Generating functional protein variants with variational autoencoders , 2020, bioRxiv.
[144] Lav R. Varshney,et al. BERTology Meets Biology: Interpreting Attention in Protein Language Models , 2020, bioRxiv.
[145] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[146] Tom Sercu,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2021, Proceedings of the National Academy of Sciences.