暂无分享,去创建一个
[1] Yoav Goldberg,et al. Towards better substitution-based word sense induction , 2019, ArXiv.
[2] Holger Schwenk,et al. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.
[3] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.
[4] Anna Korhonen,et al. An Unsupervised Model for Instance Level Subcategorization Acquisition , 2014, EMNLP.
[5] Eunsol Choi,et al. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.
[6] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[7] Jonathan Berant,et al. oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.
[8] Zhe Gan,et al. FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding , 2020, AAAI.
[9] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[10] Evgeniy Gabrilovich,et al. Large-scale learning of word relatedness with constraints , 2012, KDD.
[11] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[12] Graham Neubig,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.
[13] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[14] Dan Roth,et al. Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.
[15] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[16] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[17] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[18] Christopher D. Manning,et al. Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.
[19] Phil Blunsom,et al. Mogrifier LSTM , 2020, ICLR.
[20] Subhabrata Mukherjee,et al. XtremeDistil: Multi-stage Distillation for Massive Multilingual Models , 2020, ACL.
[21] Iryna Gurevych,et al. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.
[22] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[23] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[24] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[25] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[26] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[27] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[28] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[29] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[30] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[31] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.
[32] Jason Baldridge,et al. PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.
[33] Zhiyu Chen,et al. HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing , 2020, EACL.
[34] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..
[35] Arvind Narayanan,et al. Semantics derived automatically from language corpora contain human-like biases , 2016, Science.
[36] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[37] Sebastian Riedel,et al. MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.
[38] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[39] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[40] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.
[41] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[42] Eva Schlinger,et al. How Multilingual is Multilingual BERT? , 2019, ACL.
[43] Thomas Wolf,et al. Transfer Learning in Natural Language Processing , 2019, NAACL.
[44] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[45] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[46] Noam Shazeer,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2021, ICLR 2021 Poster.
[47] Luo Si,et al. VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation , 2020, ArXiv.
[48] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[49] Noah Goodman,et al. Investigating Transferability in Pretrained Language Models , 2020, EMNLP.