Evaluation of Methods for Protein Representation Learning: A Quantitative Analysis
暂无分享,去创建一个
Muammer Albayrak | Aybar C. Acar | Tunca Doğan | Kemal Turhan | Serbulent Unsal | Heval Ataş | K. Turhan | Muammer Albayrak | Heval Atas | Serbulent Unsal | A. Acar | Tunca Dogan
[1] Ruslan Salakhutdinov,et al. Learning Deep Generative Models , 2009 .
[2] Dekang Lin,et al. An Information-Theoretic Definition of Similarity , 1998, ICML.
[3] Burkhard Rost,et al. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing , 2020, bioRxiv.
[4] Jürgen Schmidhuber,et al. Unsupervised Learning in LSTM Recurrent Neural Networks , 2001, ICANN.
[5] Peter B. McGarvey,et al. UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..
[6] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[7] Prajjwal Bhargava. Adaptive Transformers for Learning Multimodal Representations , 2020, ACL.
[8] Matti Pietikäinen,et al. Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.
[9] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[10] S. Van Dien,et al. Biotechnology for Chemical Production: Challenges and Opportunities. , 2016, Trends in biotechnology.
[11] Ethan C. Alley,et al. Low-N protein engineering with data-efficient deep learning , 2020, Nature Methods.
[12] Manja Marz,et al. An encoding of genome content for machine learning , 2019 .
[13] James Zou,et al. Feedback GAN for DNA optimizes protein functions , 2019, Nature Machine Intelligence.
[14] Manja Marz,et al. Distributed representations of protein domains and genomes and their compositionality , 2019, bioRxiv.
[15] Arjun K. Bansal,et al. Deep Semantic Protein Representation for Annotation, Discovery, and Engineering , 2018, bioRxiv.
[16] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.
[17] Zachary Wu,et al. Learned protein embeddings for machine learning , 2018, Bioinformatics.
[18] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[19] Gary Geunbae Lee,et al. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2012, ACL 2012.
[20] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[21] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[22] Andrew R. Leach,et al. ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..
[23] Bonnie Berger,et al. Learning protein sequence embeddings using information from structure , 2019, ICLR.
[24] Lorenzo Rosasco,et al. Unsupervised learning of invariant representations , 2016, Theor. Comput. Sci..
[25] Daisuke Kihara,et al. Phylo‐PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences , 2018, Bioinform..
[26] Burkhard Rost,et al. Modeling the language of life – Deep Learning Protein Sequences , 2019, bioRxiv.
[27] Elif Ozkirimli,et al. WideDTA: prediction of drug-target binding affinity , 2019, ArXiv.
[28] Vladimir A. Kulyukin,et al. Generalized Hamming Distance , 2002, Information Retrieval.
[29] Trevor Cohen,et al. Graded Vector Representations of Immunoglobulins Produced in Response to West Nile Virus , 2016, QI.
[30] Sabrina Jaeger,et al. Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition , 2018, J. Chem. Inf. Model..
[31] Jianyang Zeng,et al. Deep learning with feature embedding for compound-protein interaction prediction , 2016, bioRxiv.
[32] Ashish Anand,et al. SpliceVec: distributed feature representations for splice junction prediction , 2017, bioRxiv.
[33] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[34] Jingcheng Du,et al. Gene2vec: distributed representation of genes based on co-expression , 2018, BMC Genomics.
[35] Yibo Wu,et al. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..
[36] P. Dobson,et al. Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.
[37] A. Tramontano,et al. Critical assessment of methods of protein structure prediction (CASP)—Round XII , 2018, Proteins.
[38] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .
[39] Helga Thorvaldsdóttir,et al. Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..
[40] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[41] Arzucan Özgür,et al. DeepDTA: deep drug–target binding affinity prediction , 2018, Bioinform..
[42] Ted Pedersen,et al. Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text , 2013, J. Biomed. Informatics.
[43] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.
[44] Tapio Salakoski,et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens , 2019, Genome Biology.
[45] E. Birney,et al. Pfam: the protein families database , 2013, Nucleic Acids Res..
[46] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[47] Alán Aspuru-Guzik,et al. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.
[48] Alice C McHardy,et al. Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX) , 2018, Scientific Reports.
[49] James M. Hogan,et al. Distributed Representations for Biological Sequence Analysis , 2016, ArXiv.
[50] Bruce Tidor,et al. Computational design of antibody-affinity improvement beyond in vivo maturation , 2007, Nature Biotechnology.
[51] Myle Ott,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.
[52] Volkan Atalay,et al. DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks , 2019, Scientific Reports.
[53] V. Uversky,et al. Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.
[54] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[55] Patrick Ng,et al. dna2vec: Consistent vector representations of variable-length k-mers , 2017, ArXiv.
[56] Hilal Tayara,et al. Deep Learning Models Based on Distributed Feature Representations for Alternative Splicing Prediction , 2018, IEEE Access.
[57] Demis Hassabis,et al. Improved protein structure prediction using potentials from deep learning , 2020, Nature.
[58] Roberto A. Chica,et al. Iterative approach to computational enzyme design , 2012, Proceedings of the National Academy of Sciences.
[59] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[60] Julien Mairal,et al. Invariance and Stability of Deep Convolutional Representations , 2017, NIPS.
[61] Maria Jesus Martin,et al. ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature , 2018, BMC Bioinformatics.
[62] Zhiyong Lu,et al. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.
[63] U. Rothlisberger,et al. Mixed Quantum Mechanical/Molecular Mechanical Molecular Dynamics Simulations of Biological Systems in Ground and Electronically Excited States. , 2015, Chemical reviews.
[64] Geoffrey I. Webb,et al. POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles , 2017, Bioinform..
[65] Jaegyoon Ahn,et al. G2Vec: Distributed gene representations for identification of cancer prognostic genes , 2018, Scientific Reports.
[66] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[67] Gisbert Schneider,et al. Designing Anticancer Peptides by Constructive Machine Learning , 2018, ChemMedChem.
[68] Tanya Barrett,et al. The Gene Expression Omnibus Database , 2016, Statistical Genomics.
[69] Diogo A. R. S. Latino,et al. Assignment of EC Numbers to Enzymatic Reactions with MOLMAP Reaction Descriptors and Random Forests , 2009, J. Chem. Inf. Model..
[70] Björn Wallner,et al. rawMSA: End-to-end Deep Learning using raw Multiple Sequence Alignments , 2019, PloS one.
[71] C. Spearman. The proof and measurement of association between two things. , 2015, International journal of epidemiology.
[72] Ron O. Dror,et al. Molecular Dynamics Simulation for All , 2018, Neuron.
[73] Wei Li,et al. RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..
[74] O. Keskin,et al. Predicting Protein-Protein Interactions from the Molecular to the Proteome Level. , 2016, Chemical reviews.
[75] Kenji Satou,et al. Improving Protein Sequence Classification Performance Using Adjacent and Overlapped Segments on Existing Protein Descriptors , 2018 .
[76] Yanjun Qi,et al. A Unified Multitask Architecture for Predicting Local Protein Properties , 2012, PloS one.
[77] L. Holm,et al. The Pfam protein families database , 2005, Nucleic Acids Res..
[78] Pelkins Ajanoh,et al. Augmenting protein network embeddings with sequence information , 2019, bioRxiv.
[79] Steve Renals,et al. Multiplicative LSTM for sequence modelling , 2016, ICLR.
[80] M. Vendruscolo,et al. Statistical mechanics of the denatured state of a protein using replica-averaged metadynamics. , 2014, Journal of the American Chemical Society.
[81] Jaewoo Kang,et al. Mut2Vec: distributed representation of cancerous mutations , 2018, BMC Medical Genomics.
[82] Yi Xiong,et al. GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank , 2017, bioRxiv.
[83] Xiaoqin Zou,et al. Statistical mechanics‐based method to extract atomic distance‐dependent potentials from protein structures , 2011, Proteins.
[84] John P. Overington,et al. HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.
[85] Ehsaneddin Asgari,et al. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics , 2015, PloS one.
[86] SchmidhuberJürgen. Deep learning in neural networks , 2015 .
[87] Seishi Shimizu,et al. Cooperativity principles in protein folding. , 2004, Methods in enzymology.
[88] John Canny,et al. Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.
[89] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[90] George M. Church,et al. Unified rational protein engineering with sequence-only deep representation learning , 2019, bioRxiv.
[91] James C. Hu,et al. The Gene Ontology Resource: 20 years and still GOing strong , 2019 .
[92] Namrata Anand,et al. Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation , 2020, bioRxiv.
[93] Kuo-Chen Chou,et al. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..
[94] M. Huss,et al. A primer on deep learning in genomics , 2018, Nature Genetics.
[95] L. Looger,et al. Computational design of receptor and sensor proteins with novel functions , 2003, Nature.
[96] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.
[97] B. Matthews. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.
[98] Hamed Haddadi,et al. Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.
[99] A. Tramontano,et al. Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.
[100] Chi Hang Wong,et al. Infer related genes from large scale gene expression dataset with embedding , 2018, bioRxiv.
[101] David Baker,et al. An exciting but challenging road ahead for computational enzyme design , 2010, Protein science : a publication of the Protein Society.
[102] Yang Liu,et al. On Identifiability in Transformers , 2020, ICLR.
[103] Eric A. Althoff,et al. Kemp elimination catalysts by computational enzyme design , 2008, Nature.
[104] Zhaoyu Li,et al. Deep Networks and Continuous Distributed Representation of Protein Sequences for Protein Quality Assessment , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).
[105] Wei Zhang,et al. A point‐charge force field for molecular mechanics simulations of proteins based on condensed‐phase quantum mechanical calculations , 2003, J. Comput. Chem..
[106] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[107] Shanfeng Zhu,et al. DeepText2Go: Improving large-scale protein function prediction with deep semantic text representation , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
[108] David T. Jones,et al. Design of metalloproteins and novel protein folds using variational autoencoders , 2018, Scientific Reports.
[109] Samuel Karlin,et al. Protein length in eukaryotic and prokaryotic proteomes , 2005, Nucleic acids research.
[110] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[111] Frank DiMaio,et al. Protein structure prediction using Rosetta in CASP12 , 2018, Proteins.
[112] Ashish Anand,et al. SpliceVec: distributed feature representations for splice junction prediction , 2017, bioRxiv.
[113] Jason Weston,et al. Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding , 2011, PLoS Comput. Biol..
[114] J. Selbig,et al. SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data , 2011, Front. Plant Sci..
[115] Jiangning Song,et al. PhosContext2vec: a distributed representation of residue-level sequence contexts and its application to general and kinase-specific phosphorylation site prediction , 2018, Scientific Reports.
[116] David Pfau,et al. Towards a Definition of Disentangled Representations , 2018, ArXiv.
[117] Marcello Farina,et al. LSTM Neural Networks: Input to State Stability and Probabilistic Safety Verification , 2019, L4DC.
[118] Regina Barzilay,et al. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..
[119] Alexey G. Murzin,et al. SCOP2 prototype: a new approach to protein structure mining , 2014, Nucleic Acids Res..
[120] D. M. Titterington,et al. Comment on “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes” , 2008, Neural Processing Letters.
[121] Wojciech Samek,et al. UDSMProt: universal deep sequence models for protein classification , 2019, bioRxiv.
[122] Andrew Y. Ng,et al. Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.
[123] Pablo Gainza,et al. Algorithms for protein design. , 2016, Current opinion in structural biology.
[124] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[125] Andre Esteva,et al. A guide to deep learning in healthcare , 2019, Nature Medicine.
[126] Kui Zhang,et al. Prediction of protein function using protein-protein interaction data , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.
[127] Simona Cocco,et al. Learning protein constitutive motifs from sequence data , 2018, eLife.
[128] Gajendra P. S. Raghava,et al. OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy , 2003, BMC Bioinformatics.
[129] Alice C. McHardy,et al. DeepPrime2Sec: Deep Learning for Protein Secondary Structure Prediction from the Primary Sequences , 2019, bioRxiv.
[130] Stephen Merity,et al. Single Headed Attention RNN: Stop Thinking With Your Head , 2019, ArXiv.
[131] Martin Wattenberg,et al. Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.
[132] Valerie Daggett,et al. Insights from molecular dynamics simulations for computational protein design. , 2017, Molecular systems design & engineering.
[133] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[134] M. K. Mejía-Guerra,et al. A k-mer grammar analysis to uncover maize regulatory architecture , 2019, BMC Plant Biology.
[135] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[136] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[137] Niles A Pierce,et al. Protein design is NP-hard. , 2002, Protein engineering.