暂无分享,去创建一个
[1] Yu Sun,et al. ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.
[2] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[3] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..
[4] Zhiyuan Liu,et al. Train No Evil: Selective Masking for Task-guided Pre-training , 2020, EMNLP.
[5] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[6] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.
[7] Yoshua Bengio,et al. Variance Reduction in SGD by Distributed Importance Sampling , 2015, ArXiv.
[8] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..
[9] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.
[10] Tie-Yan Liu,et al. Variance-reduced Language Pretraining via a Mask Proposal Network , 2020, ArXiv.
[11] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[12] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[13] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[14] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[15] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[16] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.
[17] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[18] Luo Si,et al. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding , 2019, ICLR.
[19] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[20] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[21] Junyu Zhang,et al. A Stochastic Composite Gradient Method with Incremental Variance Reduction , 2019, NeurIPS.
[22] Daniel Jurafsky,et al. Measuring the Evolution of a Scientific Field through Citation Frames , 2018, TACL.
[23] Dogu Araci,et al. FinBERT: Financial Sentiment Analysis with Pre-trained Language Models , 2019, ArXiv.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[26] N. Weiss. A Course in Probability , 2005 .
[27] Benno Stein,et al. SemEval-2019 Task 4: Hyperpartisan News Detection , 2019, *SEMEVAL.
[28] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[29] Iz Beltagy,et al. SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.
[30] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[31] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[32] Kyle Lo,et al. S2ORC: The Semantic Scholar Open Research Corpus , 2020, ACL.
[33] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[34] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[35] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[36] Jianfeng Gao,et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.
[37] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[38] Richard Socher,et al. Weighted Transformer Network for Machine Translation , 2017, ArXiv.
[39] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[40] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.
[41] Mari Ostendorf,et al. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction , 2018, EMNLP.
[42] Yi Yang,et al. FinBERT: A Pretrained Language Model for Financial Communications , 2020, ArXiv.
[43] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[44] Xi Chen,et al. Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.