ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing
暂无分享,去创建一个
Burkhard Rost | Yu Wang | Christoph Angerer | Michael Heinzinger | Llion Jones | Debsindhu Bhowmik | Tom Gibbs | Martin Steinegger | Ahmed Elnaggar | Christian Dallago | Ghalia Rehawi | Tamas Feher | B. Rost | Llion Jones | Tom Gibbs | Ahmed Elnaggar | M. Heinzinger | Christian Dallago | Ghalia Rehawi | Yu Wang | Tamas B. Fehér | Christoph Angerer | Martin Steinegger | D. Bhowmik
[1] C. Anfinsen,et al. Studies on the reduction and re-formation of protein disulfide bonds. , 1961, The Journal of biological chemistry.
[2] A. Dillmann. Enzyme Nomenclature , 1965, Nature.
[3] A. Bairoch,et al. The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.
[4] S. Henikoff,et al. Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.
[5] E. Webb. Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .
[6] A. Bairoch,et al. The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.
[7] Pieter de Haan,et al. Characteristics of Sentence Length in Running Text , 1993 .
[8] B. Rost,et al. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.
[9] B. Rost,et al. Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.
[10] B. Rost,et al. Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.
[11] B Rost,et al. Pitfalls of protein sequence analysis. , 1996, Current opinion in biotechnology.
[12] B. Rost. PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.
[13] Malmqvist,et al. Epitope Mapping by Label-Free Biomolecular Interaction Analysis , 1996, Methods.
[14] B Rost,et al. Bridging the protein sequence-structure gap by structure predictions. , 1996, Annual review of biophysics and biomolecular structure.
[15] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[16] C. Pabo,et al. High-resolution structures of variant Zif268-DNA complexes: implications for understanding zinc finger-DNA recognition. , 1998, Structure.
[17] G J Barton,et al. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.
[18] T. N. Bhat,et al. The Protein Data Bank , 2000, Nucleic Acids Res..
[19] Amos Bairoch,et al. The ENZYME database in 2000 , 2000, Nucleic Acids Res..
[20] Guoli Wang,et al. PISCES: a protein sequence culling server , 2003, Bioinform..
[21] R. Durbin,et al. Enhanced protein domain discovery by using language modeling techniques from speech recognition , 2003, Proceedings of the National Academy of Sciences of the United States of America.
[22] Burkhard Rost,et al. Improving fold recognition without folds. , 2004, Journal of molecular biology.
[23] P. Radivojac,et al. Protein flexibility and intrinsic disorder , 2004, Protein science : a publication of the Protein Society.
[24] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[25] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[26] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[27] David A. Lee,et al. PSI-2: structural genomics to cover protein domain family space. , 2009, Structure.
[28] Thomas A. Hopf,et al. Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.
[29] Milo M. Lin,et al. Hydrophobic forces and the length limit of foldable protein domains , 2012, Proceedings of the National Academy of Sciences.
[30] Thomas A. Hopf,et al. Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.
[31] Burkhard Rost,et al. Supporting online material for : LocTree 2 predicts localization for all domains of life , 2012 .
[32] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[33] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[34] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[35] Christian Cole,et al. JPred4: a protein secondary structure prediction server , 2015, Nucleic Acids Res..
[36] B. Rost,et al. Unexpected features of the dark proteome , 2015, Proceedings of the National Academy of Sciences.
[37] Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics , 2015, PloS one.
[38] Peter B. McGarvey,et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..
[39] Martin Kühn,et al. Extreme Scale-out SuperMUC Phase 2 - lessons learned , 2015, PARCO.
[40] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[41] David T. Jones,et al. Accurate contact predictions using covariation techniques and machine learning , 2015, Proteins.
[42] B. Rost,et al. TMSEG: Novel prediction of transmembrane helices , 2016, Proteins.
[43] Matthijs Douze,et al. FastText.zip: Compressing text classification models , 2016, ArXiv.
[44] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[45] Jian Peng,et al. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.
[46] Jeff Nichols,et al. Announcing Supercomputer Summit , 2016 .
[47] Wei Li,et al. RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..
[48] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[49] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[50] Ole Winther,et al. DeepLoc: prediction of protein subcellular localization using deep learning , 2017, Bioinform..
[51] Johannes Söding,et al. MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.
[52] Kuldip K. Paliwal,et al. Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..
[53] Jianzhi Zhang,et al. Evolutionary adaptations to new environments generally reverse plastic phenotypic changes , 2018, Nature Communications.
[54] Sameer Kumar,et al. PowerAI DDL , 2017, ArXiv.
[55] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[56] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[57] Thomas A. Hopf,et al. Evolutionary couplings and sequence variation effect predict protein binding sites , 2018, Proteins.
[58] Mohammed AlQuraishi,et al. End-to-end differentiable learning of protein structure , 2018, bioRxiv.
[59] R. Plemper,et al. Promotion of virus assembly and organization by the measles virus matrix protein , 2018, Nature Communications.
[60] Seán I O'Donoghue,et al. Dark Proteins Important for Cellular Function , 2018, Proteomics.
[61] Johannes Söding,et al. Clustering huge protein sequence sets in linear time , 2018 .
[62] Andriy Kryshtafovych,et al. Assessment of hard target modeling in CASP12 reveals an emerging role of alignment‐based contact prediction methods , 2018, Proteins.
[63] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[64] Kiyokuni Kawachiya,et al. TFLMS: Large Model Support in TensorFlow by Graph Rewriting , 2018, ArXiv.
[65] Kuldip K. Paliwal,et al. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? , 2016, Briefings Bioinform..
[66] Burkhard Rost,et al. Modeling aspects of the language of life through transfer-learning protein sequences , 2019, BMC Bioinformatics.
[67] Jesse Vig,et al. A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.
[68] Ole Winther,et al. NetSurfP‐2.0: Improved prediction of protein structural features by integrated deep learning , 2019, Proteins.
[69] Myle Ott,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.
[70] Mohammed AlQuraishi,et al. ProteinNet: a standardized data set for machine learning of protein structure , 2019, BMC Bioinformatics.
[71] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[72] The UniProt Consortium,et al. UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..
[73] Steven E. Brenner,et al. SCOPe: classification of large macromolecular structures in the structural classification of proteins—extended database , 2018, Nucleic Acids Res..
[74] Bonnie Berger,et al. Learning protein sequence embeddings using information from structure , 2019, ICLR.
[75] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[76] Manaal Faruqui,et al. Attention Interpretability Across NLP Tasks , 2019, ArXiv.
[77] Mohammed AlQuraishi. End-to-end differentiable learning of protein structure , 2018, bioRxiv.
[78] Alice C McHardy,et al. Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX) , 2018, Scientific Reports.
[79] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[80] John N. Weinstein,et al. ElemCor: accurate data analysis and enrichment calculation for high-resolution LC-MS stable isotope labeling experiments , 2019, BMC Bioinformatics.
[81] Johannes Söding,et al. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold , 2018, Nature Methods.
[82] George M. Church,et al. Unified rational protein engineering with sequence-based deep representation learning , 2019, Nature Methods.
[83] C. L. Rees,et al. Quantitative firing pattern phenotyping of hippocampal neuron types , 2018, Scientific Reports.
[84] Regina Barzilay,et al. Generative Models for Graph-Based Protein Design , 2019, DGS@ICLR.
[85] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[86] Richard Socher,et al. A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation , 2018, ICLR.
[87] D. Frishman,et al. Pred‐MutHTP: Prediction of disease‐causing and neutral mutations in human transmembrane proteins , 2019, Human mutation.
[88] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[89] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[90] John Canny,et al. Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.
[91] Tom Sercu,et al. Transformer protein language models are unsupervised structure learners , 2020, bioRxiv.
[92] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[93] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[94] Henrik Nielsen,et al. Language modelling for biological sequences – curated datasets and baselines , 2020, bioRxiv.
[95] Jianyi Yang,et al. Improved protein structure prediction using predicted interresidue orientations , 2020, Proceedings of the National Academy of Sciences.
[96] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[97] Burkhard Rost,et al. Embeddings from deep learning transfer GO annotations beyond homology , 2020, Scientific Reports.
[98] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[99] Ananthan Nambiar,et al. Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks , 2020, bioRxiv.
[100] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[101] Samyam Rajbhandari,et al. ZeRO: Memory Optimization Towards Training A Trillion Parameter Models , 2019, ArXiv.
[102] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[103] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[104] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[105] Nikhil Naik,et al. ProGen: Language Modeling for Protein Generation , 2020, bioRxiv.
[106] B. Rost,et al. Light attention predicts protein location from the language of life , 2021, bioRxiv.
[107] Dong Huang,et al. Optimal Gradient Checkpoint Search for Arbitrary Computation Graphs , 2018, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[108] B. Rost,et al. PredictProtein - Predicting Protein Structure and Function for 29 Years , 2021, bioRxiv.
[109] Seunghyun Park,et al. Pre-Training of Deep Bidirectional Protein Sequence Representations With Structural Information , 2019, IEEE Access.
[110] Tom Sercu,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2021, Proceedings of the National Academy of Sciences.
[111] John F. Canny,et al. MSA Transformer , 2021, bioRxiv.