暂无分享,去创建一个
Yoav Shoham | Or Sharir | Barak Peleg | Y. Shoham | Or Sharir | Barak Peleg
[1] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[2] Neil Genzlinger. A. and Q , 2006 .
[3] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[4] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[5] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[6] Jonathan S. Rosenfeld,et al. A Constructive Prediction of the Generalization Error Across Scales , 2020, ICLR.
[7] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[8] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.
[9] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[10] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[11] Yoav Shoham,et al. SenseBERT: Driving Some Sense into BERT , 2019, ACL.
[12] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[13] Donald Geman,et al. Visual Turing test for computer vision systems , 2015, Proceedings of the National Academy of Sciences.
[14] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[15] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[16] Roy Schwartz,et al. Show Your Work: Improved Reporting of Experimental Results , 2019, EMNLP.