Efficient Large Scale Language Modeling with Mixtures of Experts
暂无分享,去创建一个
Xi Victoria Lin | Punit Singh Koura | Myle Ott | Naman Goyal | Jingfei Du | Luke Zettlemoyer | Mikel Artetxe | Ramakanth Pasunuru | Zornitsa Kozareva | Todor Mihaylov | Vishrav Chaudhary | Shruti Bhosale | Tianlu Wang | Ves Stoyanov | Daniel Simig | Sam Shleifer | Shuohui Chen | Xian Li | Brian O'Horo | Jeff Wang | Mona T. Diab
[1] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[2] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[3] Pascale Fung,et al. Language Models are Few-shot Multilingual Learners , 2021, MRL.
[4] Hinrich Schutze,et al. Discrete and Soft Prompting for Multilingual Models , 2021, EMNLP.
[5] Fei Huang,et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners , 2021, ICLR.
[6] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[7] Max Ryabinin,et al. It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning , 2021, FINDINGS.
[8] Marc'Aurelio Ranzato,et al. The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation , 2021, TACL.
[9] Douwe Kiela,et al. True Few-Shot Learning with Language Models , 2021, NeurIPS.
[10] Myle Ott,et al. Larger-Scale Transformers for Multilingual Masked Language Modeling , 2021, REPL4NLP.
[11] Hannaneh Hajishirzi,et al. Cross-Task Generalization via Natural Language Crowdsourcing Instructions , 2021, ACL.
[12] Jinlan Fu,et al. XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation , 2021, EMNLP.
[13] D. Klein,et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.
[14] Noam M. Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..
[15] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.
[16] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.
[17] Graham Neubig,et al. X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models , 2020, EMNLP.
[18] Samuel R. Bowman,et al. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.
[19] D. Song,et al. Aligning AI With Shared Human Values , 2020, ICLR.
[20] Jieyu Zhao,et al. Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer , 2020, ACL.
[21] A. Korhonen,et al. XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning , 2020, EMNLP.
[22] Siva Reddy,et al. StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.
[23] Eneko Agirre,et al. Translation Artifacts in Cross-lingual Transfer Learning , 2020, EMNLP.
[24] Franck Dernoncourt,et al. Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition , 2020, LREC.
[25] Timo Schick,et al. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.
[26] Yejin Choi,et al. PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.
[27] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[28] Vishrav Chaudhary,et al. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.
[29] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[30] Myle Ott,et al. On The Evaluation of Machine Translation SystemsTrained With Back-Translation , 2019, ACL.
[31] Jason Baldridge,et al. PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.
[32] Ronan Le Bras,et al. WinoGrande , 2019, AAAI.
[33] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[34] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.
[35] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[36] Hugo Larochelle,et al. Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.
[37] Alexandra Chouldechova,et al. Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting , 2019, FAT.
[38] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.
[39] Peter Clark,et al. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.
[40] Quoc V. Le,et al. A Simple Method for Commonsense Reasoning , 2018, ArXiv.
[41] Christophe Gravier,et al. T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples , 2018, LREC.
[42] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[43] Oren Etzioni,et al. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.
[44] Nathanael Chambers,et al. A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories , 2016, ArXiv.
[45] Alexandra Birch,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[46] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[47] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.
[48] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.
[49] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.
[50] Zornitsa Kozareva,et al. SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.
[51] Luke Zettlemoyer,et al. Language Contamination Explains the Cross-lingual Capabilities of English Pretrained Models , 2022, ArXiv.
[52] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[53] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[54] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .