暂无分享,去创建一个
[1] Hinrich Schutze,et al. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.
[2] Gao Cong,et al. Tagging Your Tweets: A Probabilistic Modeling of Hashtag Annotation in Twitter , 2014, CIKM.
[3] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[4] Yiming Yang,et al. X-BERT: eXtreme Multi-label Text Classification with using Bidirectional Encoder Representations from Transformers , 2019 .
[5] James Henderson,et al. GILE: A Generalized Input-Label Embedding for Text Classification , 2018, TACL.
[6] Ser-Nam Lim,et al. A Metric Learning Reality Check , 2020, ECCV.
[7] Lei Yu,et al. Learning and Evaluating General Linguistic Intelligence , 2019, ArXiv.
[8] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[9] N. Rethmeier,et al. EffiCare: Better Prognostic Models via Resource-Efficient Health Embeddings , 2020, medRxiv.
[10] Francisco Herrera,et al. Learning from Imbalanced Data Sets , 2018, Springer International Publishing.
[11] Madian Khabsa,et al. To Pretrain or Not to Pretrain: Examining the Benefits of Pretrainng on Resource Rich Tasks , 2020, ACL.
[12] Yaohui Jin,et al. Multi-Task Label Embedding for Text Classification , 2017, EMNLP.
[13] Emily Denton,et al. Characterising Bias in Compressed Models , 2020, ArXiv.
[14] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[15] Trapit Bansal,et al. Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks , 2020, EMNLP.
[16] Phil Blunsom,et al. Mogrifier LSTM , 2020, ICLR.
[17] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.
[18] Tal Linzen,et al. How Can We Accelerate Progress Towards Human-like Linguistic Generalization? , 2020, ACL.
[19] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[20] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[21] Sara Hooker,et al. The hardware lottery , 2020, Commun. ACM.
[22] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[23] Mark Goadrich,et al. The relationship between Precision-Recall and ROC curves , 2006, ICML.
[24] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[26] Hinrich Schütze,et al. Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking , 2019, AAAI.
[27] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[28] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[29] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.
[30] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .
[31] Barbara Plank,et al. MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding , 2019, RepL4NLP@ACL.