Clustering FunFams using sequence embeddings improves EC purity
暂无分享,去创建一个
B. Rost | C. Orengo | N. Bordin | M. Heinzinger | Christian Dallago | Maria Littmann | Konstantin Schütze | Nicola Bordin
[1] Kevin K. Yang,et al. Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets , 2021, Current protocols.
[2] Radka Svobodová Vareková,et al. CATH: increased structural coverage of functional space , 2020, Nucleic Acids Res..
[3] Sayoni Das,et al. CATH functional families predict functional sites in proteins , 2020, Bioinform..
[4] Myle Ott,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.
[5] Tom Rainforth. CONTRASTIVE REPRESENTATION LEARNING , 2021 .
[6] Alan F. Smeaton,et al. Contrastive Representation Learning: A Framework and Review , 2020, IEEE Access.
[7] Burkhard Rost,et al. Embeddings from deep learning transfer GO annotations beyond homology , 2020, Scientific Reports.
[8] B. Rost,et al. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing , 2020, bioRxiv.
[9] Lav R. Varshney,et al. BERTology Meets Biology: Interpreting Attention in Protein Language Models , 2020, bioRxiv.
[10] Anne Morgat,et al. UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase , 2020, Bioinformatics.
[11] Stavros Makrodimitris,et al. Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function , 2020, bioRxiv.
[12] Sayoni Das,et al. CATH functional families predict protein functional sites , 2020, bioRxiv.
[13] Nikhil Naik,et al. ProGen: Language Modeling for Protein Generation , 2020, bioRxiv.
[14] Burkhard Rost,et al. Modeling aspects of the language of life through transfer-learning protein sequences , 2019, BMC Bioinformatics.
[15] Anne Morgat,et al. Enzyme annotation in UniProtKB using Rhea , 2019, bioRxiv.
[16] John Canny,et al. Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.
[17] Tapio Salakoski,et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens , 2019, Genome Biology.
[18] Burkhard Rost,et al. FunFam protein families improve residue level molecular function prediction , 2019, BMC Bioinformatics.
[19] George M. Church,et al. Unified rational protein engineering with sequence-only deep representation learning , 2019, bioRxiv.
[20] Milot Mirdita,et al. HH-suite3 for fast remote homology detection and deep protein annotation , 2019, BMC Bioinformatics.
[21] Ian Sillitoe,et al. CATH: expanding the horizons of structure-based functional annotations for genome sequences , 2018, Nucleic Acids Res..
[22] The UniProt Consortium,et al. UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..
[23] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[24] Johannes Söding,et al. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold , 2018, Nature Methods.
[25] Zachary Wu,et al. Learned protein embeddings for machine learning , 2018, Bioinform..
[26] Guoyin Wang,et al. Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms , 2018, ACL.
[27] Neera Borkakoti,et al. Ranking Enzyme Structures in the PDB by Bound Ligand Similarity to Biological Substrates , 2018, Structure.
[28] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[29] Iddo Friedberg,et al. Identifying antimicrobial peptides using word embedding with deep recurrent neural networks , 2018, bioRxiv.
[30] Johannes Söding,et al. Clustering huge protein sequence sets in linear time , 2017, Nature Communications.
[31] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[32] Johannes Söding,et al. MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.
[33] David A. Lee,et al. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation , 2015, Bioinform..
[34] Ehsaneddin Asgari,et al. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics , 2015, PloS one.
[35] David A. Lee,et al. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation , 2015, Bioinform..
[36] David A. Lee,et al. CATH FunFHMMer web server: protein functional annotations using functional family assignments , 2015, Nucleic Acids Res..
[37] Peter B. McGarvey,et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..
[38] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[39] David A. Lee,et al. New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures , 2012, Nucleic Acids Res..
[40] Yang Zhang,et al. BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..
[41] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[42] Dan S. Tawfik,et al. Enzyme promiscuity: a mechanistic and evolutionary perspective. , 2010, Annual review of biochemistry.
[43] David A. Lee,et al. GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains , 2009, Nucleic acids research.
[44] Eamonn J. Keogh. Nearest Neighbor , 2010, Encyclopedia of Machine Learning.
[45] Mona Singh,et al. Characterization and prediction of residues determining protein functional specificity , 2008, Bioinform..
[46] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.
[47] Burkhard Rost,et al. CHOP proteins into structural domain‐like fragments , 2004, Proteins.
[48] Hans-Peter Kriegel,et al. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.
[49] Constance J Jeffery,et al. Moonlighting proteins: old proteins learning new tricks. , 2003, Trends in genetics : TIG.
[50] J. R. Scotti,et al. Available From , 1973 .
[51] R. Russell,et al. Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.
[52] T. N. Bhat,et al. The Protein Data Bank , 2000, Nucleic Acids Res..
[53] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.
[54] David C. Jones,et al. CATH--a hierarchic classification of protein domain structures. , 1997, Structure.
[55] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.
[56] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..
[57] Geoffrey E. Hinton,et al. Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.
[58] E. Webb. Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .
[59] A. M. B. DOUGLAS,et al. X-Ray Crystallography , 1947, Nature.
[60] M. Nadeau,et al. Proteins : Structure , Function , and Bioinformatics , 2022 .