The language of proteins: NLP, machine learning & protein sequences
暂无分享,去创建一个
[1] Sanjeev Arora,et al. A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.
[2] Inbal Budowski-Tal,et al. FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately , 2010, Proceedings of the National Academy of Sciences.
[3] Sandor Vajda,et al. CAPRI: A Critical Assessment of PRedicted Interactions , 2003, Proteins.
[4] B. Rost,et al. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing , 2020, bioRxiv.
[5] Lav R. Varshney,et al. CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.
[6] Orna Man,et al. Proteomic signatures: Amino acid and oligopeptide compositions differentiate among phyla , 2003, Proteins.
[7] Michal Linial,et al. ASAP: a machine learning framework for local protein properties , 2015, bioRxiv.
[8] Johannes Söding,et al. MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.
[9] Noah A. Smith. Contextual Word Representations: A Contextual Introduction , 2019, ArXiv.
[10] Y. Singer,et al. Ultraconservative online algorithms for multiclass problems , 2003 .
[11] Byunghan Lee,et al. Deep learning in bioinformatics , 2016, Briefings Bioinform..
[12] Piero Fariselli,et al. DeepSig: deep learning improves signal peptide detection in proteins , 2017, Bioinform..
[13] Wojciech Samek,et al. UDSMProt: universal deep sequence models for protein classification , 2019, bioRxiv.
[14] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[15] Chenxi Liu,et al. Deep Nets: What have They Ever Done for Vision? , 2018, International Journal of Computer Vision.
[16] Rob Phillips,et al. Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment , 2009, Bioinform..
[17] Ellen D. Zhong,et al. Learning the language of viral evolution and escape , 2020, Science.
[18] Xiaocheng Feng,et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, EMNLP.
[19] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.
[20] E. Trifonov. The origin of the genetic code and of the earliest oligopeptides. , 2009, Research in microbiology.
[21] Michal Linial,et al. Families of membranous proteins can be characterized by the amino acid composition of their transmembrane domains , 2005, ISMB.
[22] R. Levy,et al. Simplified amino acid alphabets for protein fold recognition and implications for folding. , 2000, Protein engineering.
[23] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[24] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[25] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[26] Patricia C. Babbitt,et al. Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..
[27] Hannah Currant,et al. FFPred 3: feature-based function prediction for all Gene Ontology domains , 2016, Scientific Reports.
[28] Michael Krauthammer,et al. Neural networks versus Logistic regression for 30 days all-cause readmission prediction , 2018, Scientific Reports.
[29] Chengsheng Mao,et al. KG-BERT: BERT for Knowledge Graph Completion , 2019, ArXiv.
[30] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[31] F. Arnold,et al. Signal Peptides Generated by Attention-Based Neural Networks. , 2020, ACS synthetic biology.
[32] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.
[33] Burkhard Rost,et al. Modeling aspects of the language of life through transfer-learning protein sequences , 2019, BMC Bioinformatics.
[34] T. N. Bhat,et al. The Protein Data Bank , 2000, Nucleic Acids Res..
[35] Michel Schneider,et al. UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.
[36] T G Dewey,et al. The Shannon information entropy of protein sequences. , 1996, Biophysical journal.
[37] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .
[38] Hiroyuki Shindo,et al. Neural Attentive Bag-of-Entities Model for Text Classification , 2019, CoNLL.
[39] Henrik Nielsen,et al. Language modelling for biological sequences – curated datasets and baselines , 2020, bioRxiv.
[40] Oliver Kohlbacher,et al. MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition , 2006, Bioinform..
[41] A. Mignan,et al. One neuron versus deep learning in aftershock prediction , 2019, Nature.
[42] Thomas L. Griffiths,et al. Evaluating Vector-Space Models of Word Representation, or, The Unreasonable Effectiveness of Counting Words Near Other Words , 2017, CogSci.
[43] Srikumar Venugopal,et al. Scalable Protein Sequence Similarity Search using Locality-Sensitive Hashing and MapReduce , 2013, ArXiv.
[44] Claude E. Shannon,et al. Prediction and Entropy of Printed English , 1951 .
[45] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[46] Omer Levy,et al. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.
[47] Zachary Wu,et al. Learned protein embeddings for machine learning , 2018, Bioinformatics.
[48] Myle Ott,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.
[49] O. Stegle,et al. Deep learning for computational biology , 2016, Molecular systems biology.
[50] Michal Linial,et al. ProFET: Feature engineering captures high-level protein functions , 2015, Bioinform..
[51] Ting Chen,et al. Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search , 2007, Bioinform..
[52] Thomas A. Funkhouser,et al. Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[54] Michal Linial,et al. When Less Is More: Improving Classification of Protein Families with a Minimal Set of Global Features , 2007, WABI.
[55] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[56] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.
[57] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[58] Jason Weston,et al. Mismatch string kernels for discriminative protein classification , 2004, Bioinform..
[59] D. Sculley,et al. Using deep learning to annotate the protein universe , 2019, Nature Biotechnology.
[60] Michal Linial,et al. ClanTox: a classifier of short animal toxins , 2009, Nucleic Acids Res..
[61] Bing Zhang,et al. Deep Learning in Proteomics , 2020, Proteomics.
[62] Michal Linial,et al. Cooperativity within proximal phosphorylation sites is revealed from large-scale proteomics data , 2010, Biology Direct.
[63] Bonnie Berger,et al. Learning protein sequence embeddings using information from structure , 2019, ICLR.
[64] Qiang Zhou,et al. Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 , 2020, Science.
[65] Michal Linial,et al. The complete peptide dictionary – A meta‐proteomics resource , 2010, Proteomics.
[66] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[67] Michal Linial,et al. NeuroPID: a predictor for identifying neuropeptide precursors from metazoan proteomes , 2014, Bioinform..
[68] J. Hoh,et al. Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein , 2004, FEBS letters.
[69] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[70] John Canny,et al. Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.
[71] Zhihan Zhou,et al. DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome , 2020, bioRxiv.
[72] Olivier Raiman,et al. DeepType: Multilingual Entity Linking by Neural Type System Evolution , 2018, AAAI.
[73] Michael Heinzinger,et al. Embeddings from deep learning transfer GO annotations beyond homology , 2021, Scientific reports.
[74] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[75] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[76] Nikhil Naik,et al. ProGen: Language Modeling for Protein Generation , 2020, bioRxiv.
[77] Bruce R. Southey,et al. Evaluation of Database Search Programs for Accurate Detection of Neuropeptides in Tandem Mass Spectrometry Experiments , 2012, Journal of proteome research.
[78] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[79] Guillaume Lample,et al. Deep Learning for Symbolic Mathematics , 2019, ICLR.
[80] Tapio Salakoski,et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.
[81] Peter Norvig,et al. The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.
[82] A. Tramontano,et al. Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.
[83] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[84] Lav R. Varshney,et al. BERTology Meets Biology: Interpreting Attention in Protein Language Models , 2020, bioRxiv.
[85] D. Ofer,et al. Machine Learning for Protein Function , 2016, 1603.02021.
[86] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[87] George M. Church,et al. Unified rational protein engineering with sequence-based deep representation learning , 2019, Nature Methods.
[88] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[89] Ehsaneddin Asgari,et al. ProtVec: A Continuous Distributed Representation of Biological Sequences , 2015, ArXiv.
[90] Alice C McHardy,et al. Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX) , 2018, Scientific Reports.
[91] Lefteris Koumakis,et al. Deep learning models in genomics; are we there yet? , 2020, Computational and structural biotechnology journal.
[92] Ole Winther,et al. DeepLoc: prediction of protein subcellular localization using deep learning , 2017, Bioinform..
[93] O B Ptitsyn. How does protein synthesis give rise to the 3D‐structure? , 1991, FEBS letters.
[94] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[95] Demis Hassabis,et al. Improved protein structure prediction using potentials from deep learning , 2020, Nature.
[96] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[97] Geoffrey E. Hinton,et al. Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.
[98] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[99] Paul Pavlidis,et al. Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA) , 2013, BMC Bioinformatics.
[100] Wojciech Samek,et al. UDSMProt: universal deep sequence models for protein classification , 2020, Bioinformatics.
[101] Torsten Schwede,et al. Critical assessment of methods of protein structure prediction (CASP)—Round XIII , 2019, Proteins.
[102] I. Xenarios,et al. UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. , 2016, Methods in molecular biology.
[103] Eleazar Eskin,et al. The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.
[104] E. Horvitz,et al. On Biases Of Attention In Scientific Discovery. , 2020, Bioinformatics.
[105] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[106] A. Biegert,et al. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.
[107] Wang Liang,et al. Detecting "protein words" through unsupervised word segmentation , 2014 .
[108] Nuo Wang Pierse,et al. Aligning the Pretraining and Finetuning Objectives of Language Models , 2020, ArXiv.
[109] Douglas L. Brutlag,et al. Sequence Motifs: Highly Predictive Features of Protein Function , 2006, Feature Extraction.
[110] Malay Kumar Basu,et al. Grammar of protein domain architectures , 2019, Proceedings of the National Academy of Sciences.
[111] D. Baker,et al. Global analysis of protein folding using massively parallel design, synthesis, and testing , 2017, Science.
[112] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[113] Xiao Li,et al. A High Efficient Biological Language Model for Predicting Protein–Protein Interactions , 2019, Cells.
[114] Cesare Furlanello,et al. Machine learning methods for predictive proteomics , 2007, Briefings Bioinform..
[115] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[116] Eytan Ruppin,et al. Unsupervised learning of natural languages , 2006 .
[117] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.
[118] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[119] John F. Canny,et al. MSA Transformer , 2021, bioRxiv.
[120] Georgios A. Pavlopoulos,et al. Protein-protein interaction predictions using text mining methods. , 2015, Methods.