ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing.
暂无分享,去创建一个
B. Rost | Llion Jones | Tom Gibbs | Ahmed Elnaggar | M. Heinzinger | Christian Dallago | Ghalia Rehawi | Tamas B. Fehér | Christoph Angerer | Martin Steinegger | D. Bhowmik | Wang Yu
[1] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[2] Burkhard Rost,et al. Improving fold recognition without folds. , 2004, Journal of molecular biology.
[3] Guoli Wang,et al. PISCES: a protein sequence culling server , 2003, Bioinform..
[4] Burkhard Rost,et al. Modeling aspects of the language of life through transfer-learning protein sequences , 2019, BMC Bioinformatics.
[5] P. Radivojac,et al. Protein flexibility and intrinsic disorder , 2004, Protein science : a publication of the Protein Society.
[6] Myle Ott,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.
[7] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[8] Henrik Nielsen,et al. Language modelling for biological sequences – curated datasets and baselines , 2020, bioRxiv.
[9] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[10] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[12] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[13] Jianyi Yang,et al. Improved protein structure prediction using predicted interresidue orientations , 2020, Proceedings of the National Academy of Sciences.
[14] The UniProt Consortium,et al. UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..
[15] B. Rost,et al. Unexpected features of the dark proteome , 2015, Proceedings of the National Academy of Sciences.
[16] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[17] Steven E. Brenner,et al. SCOPe: classification of large macromolecular structures in the structural classification of proteins—extended database , 2018, Nucleic Acids Res..
[18] B. Rost,et al. Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.
[19] Pieter de Haan,et al. Characteristics of Sentence Length in Running Text , 1993 .
[20] B. Rost. PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.
[21] Ole Winther,et al. DeepLoc: prediction of protein subcellular localization using deep learning , 2017, Bioinform..
[22] Bonnie Berger,et al. Learning protein sequence embeddings using information from structure , 2019, ICLR.
[23] Johannes Söding,et al. Clustering huge protein sequence sets in linear time , 2017, Nature Communications.
[24] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[25] Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics , 2015, PloS one.
[26] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[27] B. Rost,et al. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.
[28] B. Rost,et al. TMSEG: Novel prediction of transmembrane helices , 2016, Proteins.
[29] Thomas A. Hopf,et al. Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.
[30] T. N. Bhat,et al. The Protein Data Bank , 2000, Nucleic Acids Res..
[31] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[32] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[33] Ole Winther,et al. NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning , 2018, bioRxiv.
[34] Matthijs Douze,et al. FastText.zip: Compressing text classification models , 2016, ArXiv.
[35] Ananthan Nambiar,et al. Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks , 2020, bioRxiv.
[36] Dong Huang,et al. Optimal Gradient Checkpoint Search for Arbitrary Computation Graphs , 2018, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] B. Rost,et al. Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.
[38] Thomas A. Hopf,et al. Evolutionary couplings and sequence variation effect predict protein binding sites , 2018, Proteins.
[39] Samyam Rajbhandari,et al. ZeRO: Memory Optimization Towards Training A Trillion Parameter Models , 2019, ArXiv.
[40] Peter B. McGarvey,et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..
[41] George M. Church,et al. Unified rational protein engineering with sequence-only deep representation learning , 2019, bioRxiv.
[42] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[43] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[44] Martin Kühn,et al. Extreme Scale-out SuperMUC Phase 2 - lessons learned , 2015, PARCO.
[45] Amos Bairoch,et al. The ENZYME database in 2000 , 2000, Nucleic Acids Res..
[46] Mohammed AlQuraishi,et al. End-to-end differentiable learning of protein structure , 2018, bioRxiv.
[47] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[48] Seunghyun Park,et al. Pre-Training of Deep Bidirectional Protein Sequence Representations With Structural Information , 2019, IEEE Access.
[49] Johannes Söding,et al. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold , 2018, Nature Methods.
[50] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[51] A. Dillmann. Enzyme Nomenclature , 1965, Nature.
[52] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[53] Johannes Söding,et al. MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.
[54] B Rost,et al. Bridging the protein sequence-structure gap by structure predictions. , 1996, Annual review of biophysics and biomolecular structure.
[55] G J Barton,et al. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.
[56] Regina Barzilay,et al. Generative Models for Graph-Based Protein Design , 2019, DGS@ICLR.
[57] E. Webb. Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .
[58] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[59] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[60] Richard Socher,et al. A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation , 2018, ICLR.
[61] Nikhil Naik,et al. ProGen: Language Modeling for Protein Generation , 2020, bioRxiv.
[62] D. Frishman,et al. Pred‐MutHTP: Prediction of disease‐causing and neutral mutations in human transmembrane proteins , 2019, Human mutation.
[63] Andriy Kryshtafovych,et al. Assessment of hard target modeling in CASP12 reveals an emerging role of alignment‐based contact prediction methods , 2018, Proteins.
[64] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[65] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[66] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[67] Jeff Nichols,et al. Announcing Supercomputer Summit , 2016 .
[68] John Canny,et al. Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.
[69] Kiyokuni Kawachiya,et al. TFLMS: Large Model Support in TensorFlow by Graph Rewriting , 2018, ArXiv.
[70] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[71] Sameer Kumar,et al. PowerAI DDL , 2017, ArXiv.
[72] Kuldip K. Paliwal,et al. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? , 2016, Briefings Bioinform..