AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
暂无分享,去创建一个
Jack G. M. FitzGerald | Stephen Rawls | Anna Rumshisky | G. Tur | W. Hamza | Apurv Verma | Charith S. Peris | Mukund Sridhar | Chandan Prakash | Saleh Soltan | Rahul Gupta | Andrew Rosenbaum | Premkumar Natarajan | Shankar Ananthakrishnan | Haidar Khan | Fabian Triefenbach | Wael Hamza
[1] Lisa Anne Hendricks,et al. Taxonomy of Risks posed by Language Models , 2022, FAccT.
[2] Dilek Z. Hakkani-Tür,et al. Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems , 2022, KDD.
[3] R. Zemel,et al. Differentially Private Decoding in Large Language Models , 2022, ArXiv.
[4] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.
[5] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[6] Hyung Won Chung,et al. What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? , 2022, ICML.
[7] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[8] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.
[9] Kai-Wei Chang,et al. Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal , 2022, Findings.
[10] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[11] Florian Tramèr,et al. Quantifying Memorization Across Neural Language Models , 2022, ICLR.
[12] Colin Raffel,et al. Deduplicating Training Data Mitigates Privacy Risks in Language Models , 2022, ICML.
[13] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[14] Xi Victoria Lin,et al. Few-shot Learning with Multilingual Generative Language Models , 2021, EMNLP.
[15] Quoc V. Le,et al. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts , 2021, ICML.
[16] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[17] Po-Sen Huang,et al. Challenges in Detoxifying Language Models , 2021, EMNLP.
[18] Alexander M. Rush,et al. Datasets: A Community Library for Natural Language Processing , 2021, EMNLP.
[19] Nicholas Carlini,et al. Deduplicating Training Data Makes Language Models Better , 2021, ACL.
[20] Shannon L. Spruit,et al. Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling , 2021, ArXiv.
[21] Ruslan Salakhutdinov,et al. Towards Understanding and Mitigating Social Biases in Language Models , 2021, ICML.
[22] Max Ryabinin,et al. It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning , 2021, FINDINGS.
[23] Marc'Aurelio Ranzato,et al. The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation , 2021, TACL.
[24] Kai-Wei Chang,et al. Societal Biases in Language Generation: Progress and Challenges , 2021, ACL.
[25] David R. So,et al. Carbon Emissions and Large Neural Network Training , 2021, ArXiv.
[26] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[27] Timo Schick,et al. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP , 2021, Transactions of the Association for Computational Linguistics.
[28] Diyi Yang,et al. The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics , 2021, GEM.
[29] Kai-Wei Chang,et al. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation , 2021, FAccT.
[30] Milad Nasr,et al. Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning , 2021, 2021 IEEE Symposium on Security and Privacy (SP).
[31] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.
[32] Holger Schwenk,et al. Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..
[33] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.
[34] Olatunji Ruwase,et al. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.
[35] Edouard Grave,et al. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.
[36] Jonathan Ullman,et al. Auditing Differentially Private Machine Learning: How Private is Private SGD? , 2020, NeurIPS.
[37] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[38] Solon Barocas,et al. Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.
[39] A. Korhonen,et al. XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning , 2020, EMNLP.
[40] Sylvain Lamprier,et al. MLSUM: The Multilingual Summarization Corpus , 2020, EMNLP.
[41] Mary Williamson,et al. Recipes for Building an Open-Domain Chatbot , 2020, EACL.
[42] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.
[43] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[44] Tie-Yan Liu,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[45] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.
[46] J. Weston,et al. Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation , 2019, EMNLP.
[47] Lijun Wu,et al. Microsoft Research Asia’s Systems for WMT19 , 2019, WMT.
[48] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[49] Verena Rieser,et al. Semantic Noise Matters for Neural Natural Language Generation , 2019, INLG.
[50] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[51] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[52] Jiliang Tang,et al. Does Gender Matter? Towards Fairness in Dialogue Systems , 2019, COLING.
[53] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[54] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[55] J. Yosinski,et al. Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2019, ICLR.
[56] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[57] Nanyun Peng,et al. The Woman Worked as a Babysitter: On Biases in Language Generation , 2019, EMNLP.
[58] J. M. Phillips,et al. On Measuring and Mitigating Biased Inferences of Word Embeddings , 2019, AAAI.
[59] Jason Weston,et al. Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.
[60] Jason Baldridge,et al. PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.
[61] Alan W Black,et al. Quantifying Social Biases in Contextual Word Representations , 2019, ACL 2019.
[62] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[63] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.
[64] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[65] David Evans,et al. Evaluating Differentially Private Machine Learning in Practice , 2019, USENIX Security Symposium.
[66] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.
[67] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[68] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[69] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[70] Rachel Rudinger,et al. Gender Bias in Coreference Resolution , 2018, NAACL.
[71] Verena Rieser,et al. The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.
[72] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[73] Dan Roth,et al. Solving General Arithmetic Word Problems , 2016, EMNLP.
[74] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[75] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[76] C. Habel,et al. Language , 1931, NeuroImage.
[77] Vinh Q. Tran,et al. Unifying Language Learning Paradigms , 2022, ArXiv.
[78] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[79] Michael White,et al. Structure-to-Text Generation with Self-Training, Acceptability Classifiers and Context-Conditioning for the GEM Shared Task , 2021, GEM.
[80] Thiago Castro Ferreira,et al. The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020) , 2020, WEBNLG.
[81] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[82] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[83] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .