暂无分享,去创建一个
Sebastian Ruder | Hyung Won Chung | Thibault F'evry | Melvin Johnson | Henry Tsai | Sebastian Ruder | Melvin Johnson | Thibault Févry | Henry Tsai
[1] Eva Schlinger,et al. How Multilingual is Multilingual BERT? , 2019, ACL.
[2] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[3] Jonathan Berant,et al. oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.
[4] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[5] Sebastian Riedel,et al. MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..
[8] Francis M. Tyers,et al. Universal Dependencies , 2017, EACL.
[9] Yaser Al-Onaizan,et al. Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.
[10] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[11] Shumeet Baluja,et al. Advances in Neural Information Processing , 1994 .
[12] Zhiyu Chen,et al. HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing , 2020, EACL.
[13] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[14] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[15] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[16] Dan Roth,et al. Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.
[17] Christopher D. Manning,et al. Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.
[18] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[19] Thomas Wolf,et al. Transfer Learning in Natural Language Processing , 2019, NAACL.
[20] Holger Schwenk,et al. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.
[21] Veselin Stoyanov,et al. Emerging Cross-lingual Structure in Pretrained Language Models , 2020, ACL.
[22] Luo Si,et al. VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation , 2020, ArXiv.
[23] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[24] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[25] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.
[26] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[27] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.
[28] Heng Ji,et al. Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.
[29] Jason Baldridge,et al. PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.
[30] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[31] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[32] Oren Etzioni,et al. Green AI , 2019, Commun. ACM.
[33] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[34] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[35] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[36] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[37] Phil Blunsom,et al. Mogrifier LSTM , 2020, ICLR.
[38] Iryna Gurevych,et al. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.
[39] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[40] Arvind Narayanan,et al. Semantics derived automatically from language corpora contain human-like biases , 2016, Science.
[41] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[42] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[43] Subhabrata Mukherjee,et al. XtremeDistil: Multi-stage Distillation for Massive Multilingual Models , 2020, ACL.
[44] Graham Neubig,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.
[45] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[46] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[47] Anna Korhonen,et al. An Unsupervised Model for Instance Level Subcategorization Acquisition , 2014, EMNLP.
[48] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[49] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.
[50] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[51] Ondrej Bojar,et al. Results of the WMT16 Metrics Shared Task , 2016 .
[52] Pierre Zweigenbaum,et al. Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora , 2017, BUCC@ACL.
[53] Noah Goodman,et al. Investigating Transferability in Pretrained Language Models , 2020, EMNLP.
[54] Zhe Gan,et al. FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding , 2021, AAAI.
[55] Felix Hill,et al. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.
[56] Eunsol Choi,et al. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.
[57] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[58] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[59] Samuel R. Bowman,et al. English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too , 2020, AACL/IJCNLP.
[60] Kilian Q. Weinberger,et al. Revisiting Few-sample BERT Fine-tuning , 2020, ArXiv.
[61] Orhan Firat,et al. Massively Multilingual Neural Machine Translation , 2019, NAACL.
[62] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[63] Evgeniy Gabrilovich,et al. Large-scale learning of word relatedness with constraints , 2012, KDD.
[64] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[65] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[66] Yoav Goldberg,et al. Towards better substitution-based word sense induction , 2019, ArXiv.
[67] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.