论文信息 - Analyzing the Limits of Self-Supervision in Handling Bias in Language - 字舞流文

Analyzing the Limits of Self-Supervision in Handling Bias in Language

Prompting inputs with natural language task descriptions has emerged as a popular mechanism to elicit reasonably accurate outputs from large-scale generative language models with little to no in-context supervision. This also helps gain insight into how well language models capture the semantics of a wide range of downstream tasks purely from self-supervised pre-training on massive corpora of unlabeled text. Such models have naturally also been exposed to a lot of undesirable content like racist and sexist language and there is limited work on awareness of models along these dimensions. In this paper, we define and comprehensively evaluate how well such language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing. We define three broad classes of task descriptions for these tasks: statement, question, and completion, with numerous lexical variants within each class. We study the efficacy of prompting for each task using these classes and the null task description across several decoding methods and few-shot examples. Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation. We believe our work is an important step towards unbiased language models by quantifying the limits of current self-supervision objectives at accomplishing such sociologically challenging tasks.

Dilek Z. Hakkani-Tür | Mohit Bansal | Spandana Gella | Yang Liu | Karthik Gopalakrishnan | Lisa Bauer

[1] Yoav Goldberg,et al. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space , 2022, EMNLP.

[2] M. Shoeybi,et al. Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models , 2022, NeurIPS.

[3] Hua Wu,et al. PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation , 2021, AACL/IJCNLP.

[4] Goran Glavas,et al. Sustainable Modular Debiasing of Language Models , 2021, EMNLP.

[5] Ellie Pavlick,et al. Do Prompt-Based Models Really Understand the Meaning of Their Prompts? , 2021, NAACL.

[6] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[7] Philipp Koehn,et al. Facebook AI’s WMT21 News Translation Task Submission , 2021, WMT.

[8] Hiroaki Hayashi,et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..

[9] Ruslan Salakhutdinov,et al. Towards Understanding and Mitigating Social Biases in Language Models , 2021, ICML.

[10] Dong Nguyen,et al. Introducing CAD: the Contextual Abuse Dataset , 2021, NAACL.

[11] Yejin Choi,et al. DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts , 2021, ACL.

[12] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[13] Olatunji Ruwase,et al. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep learning , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14] Timo Schick,et al. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP , 2021, Transactions of the Association for Computational Linguistics.

[15] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[16] Matt Gardner,et al. Learning from Task Descriptions , 2020, EMNLP.

[17] Yejin Choi,et al. PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction , 2020, EMNLP.

[18] Omer Levy,et al. The Turking Test: Can Language Models Understand Instructions? , 2020, ArXiv.

[19] Samuel R. Bowman,et al. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.

[20] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.

[21] Hinrich Schütze,et al. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.

[22] Armen Aghajanyan,et al. Pre-training via Paraphrasing , 2020, NeurIPS.

[23] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[24] R. Socher,et al. A Simple Language Model for Task-Oriented Dialogue , 2020, NeurIPS.

[25] Siva Reddy,et al. StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.

[26] Timo Schick,et al. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.

[27] Noah A. Smith,et al. Social Bias Frames: Reasoning about Social and Power Implications of Language , 2019, ACL.

[28] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[29] Samyam Rajbhandari,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2019, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[30] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[31] Nanyun Peng,et al. The Woman Worked as a Babysitter: On Biases in Language Generation , 2019, EMNLP.

[32] Ankur Bapna,et al. Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[33] D. Hillson. What Are Opportunities? , 2019, Capturing Upside Risk.

[34] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.

[35] Lucy Vasserman,et al. Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification , 2019, WWW.

[36] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[37] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[38] Andrea Vedaldi,et al. Learning multiple visual domains with residual adapters , 2017, NIPS.

[39] Ian S. Dunn,et al. Exploring the Limits , 2009 .

[40] Yi Yang,et al. Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts , 2022, ACL.

[41] Nanyun Peng,et al. What do Bias Measures Measure? , 2021, ArXiv.

[42] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .