DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
暂无分享,去创建一个
Zhihan Zhou | Han Liu | Yanrong Ji | Ramana V Davuluri | R. Davuluri | Zhihan Zhou | Han Liu | Y. Ji | Yanrong Ji
[1] Shaojie Qiao,et al. DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding , 2019, International Journal of Machine Learning and Cybernetics.
[2] Ramana V. Davuluri,et al. In silico analysis of alternative splicing on drug-target gene interactions , 2020, Scientific Reports.
[3] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..
[4] Ruohan Wang,et al. SpliceFinder: ab initio prediction of splice sites using convolutional neural network , 2019, BMC Bioinformatics.
[5] A. Sandelin,et al. Determinants of enhancer and promoter activities of regulatory elements , 2019, Nature Reviews Genetics.
[6] Fei Li,et al. Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study , 2019, JMIR medical informatics.
[7] Yu Li,et al. Promoter analysis and prediction in the human genome using sequence-based deep learning models , 2019, Bioinform..
[8] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[9] Jesse Vig,et al. A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.
[10] Kil To Chong,et al. DeePromoter: Robust Promoter Predictor Using Deep Learning , 2019, Front. Genet..
[11] Helen E. Parkinson,et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..
[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[13] M. Huss,et al. A primer on deep learning in genomics , 2018, Nature Genetics.
[14] De-Shuang Huang,et al. Recurrent Neural Network for Predicting Transcription Factor Binding Sites , 2018, Scientific Reports.
[15] D. Yan,et al. Interaction of polymorphisms in xeroderma pigmentosum group C with cigarette smoking and pancreatic cancer risk , 2018, Oncology letters.
[16] Abdullah M. Khamis,et al. A novel method for improved accuracy of transcription factor binding site prediction , 2018, Nucleic acids research.
[17] R. O’Malley,et al. Mapping genome-wide transcription-factor binding sites using DAP-seq , 2017, Nature Protocols.
[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[19] Albin Sandelin,et al. The Landscape of Isoform Switches in Human Cancers , 2017, Molecular Cancer Research.
[20] Feng Xu,et al. Predicting regulatory variants with composite statistic , 2016, Bioinform..
[21] David R. Kelley,et al. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.
[22] Xiaohui S. Xie,et al. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.
[23] O. Troyanskaya,et al. Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.
[24] B. Frey,et al. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.
[25] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[26] S. Gerstberger,et al. A census of human RNA-binding proteins , 2014, Nature Reviews Genetics.
[27] Richard Leslie,et al. GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database , 2014, Bioinform..
[28] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[29] Deanna M. Church,et al. ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..
[30] Howard Y. Chang,et al. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.
[31] Giovanna Ambrosini,et al. EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era , 2012, Nucleic Acids Res..
[32] David Haussler,et al. ENCODE Data in the UCSC Genome Browser: year 5 update , 2012, Nucleic Acids Res..
[33] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[34] Data production leads,et al. An integrated encyclopedia of DNA elements in the human genome , 2012 .
[35] Raymond K. Auerbach,et al. An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.
[36] Bronwen L. Aken,et al. GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.
[37] Philip Cayting,et al. An encyclopedia of mouse DNA elements (Mouse ENCODE) , 2012, Genome Biology.
[38] Job Dekker,et al. The context of gene expression regulation , 2012, F1000 biology reports.
[39] H. Stunnenberg,et al. Crosstalk between c-Jun and TAp73α/β contributes to the apoptosis–survival balance , 2011, Nucleic acids research.
[40] C. Burge,et al. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. , 2008, RNA.
[41] Sumio Sugano,et al. The functional consequences of alternative promoter use in mammalian genomes. , 2008, Trends in genetics : TIG.
[42] E. Aller,et al. MYO7A mutation screening in Usher syndrome type I patients from diverse origins , 2006, Journal of Medical Genetics.
[43] William Stafford Noble,et al. Quantifying similarity between motifs , 2007, Genome Biology.
[44] V. Solovyev,et al. Automatic annotation of eukaryotic genes, pseudogenes and promoters , 2006, Genome Biology.
[45] A. Ballabio,et al. The Multiple Sulfatase Deficiency Gene Encodes an Essential and Limiting Factor for the Activity of Sulfatases , 2003, Cell.
[46] R. Davuluri. Application of FirstEF to Find Promoters and First Exons in the Human Genome , 2003, Current protocols in bioinformatics.
[47] Colin N. Dewey,et al. Initial sequencing and comparative analysis of the mouse genome. , 2002 .
[48] F. Wright,et al. Gene expression profiling of isogenic cells with different TP53 gene dosage reveals numerous genes that are affected by TP53 dosage and identifies CSPG2 as a direct target of p53 , 2002, Proceedings of the National Academy of Sciences of the United States of America.
[49] D. Searls,et al. Robots in invertebrate neuroscience , 2002, Nature.
[50] Olivier Gascuel,et al. Proceedings of the First International Workshop on Algorithms in Bioinformatics , 2001 .
[51] Elizabeth M. Smigielski,et al. dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..
[52] S Ji,et al. The Linguistics of DNA: Words, Sentences, Grammar, Phonetics, and Semantics , 1999, Annals of the New York Academy of Sciences.
[53] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[54] H E Stanley,et al. Linguistic features of noncoding DNA sequences. , 1994, Physical review letters.
[55] Shumeet Baluja,et al. Advances in Neural Information Processing , 1994 .
[56] David B. Searls,et al. The Linguistics of DNA , 1992 .
[57] Tom Head,et al. Formal language theory and DNA: An analysis of the generative capacity of specific recombinant behaviors , 1987 .
[58] T. Head. Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviors. , 1987, Bulletin of mathematical biology.
[59] V. Brendel,et al. Genome structure described by formal languages. , 1984, Nucleic acids research.
[60] M Nirenberg,et al. RNA codewords and protein synthesis, VII. On the general nature of the RNA code. , 1965, Proceedings of the National Academy of Sciences of the United States of America.