论文信息 - CAN MACHINES LEARN MORALITY? THE DELPHI EXPERIMENT - 字舞流文

CAN MACHINES LEARN MORALITY? THE DELPHI EXPERIMENT

As AI systems become increasingly powerful and pervasive, there are growing concerns about machines’ morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality remains among the most intensely debated questions in humanity, let alone for AI. Existing AI systems deployed to millions of users, however, are already making decisions loaded with moral implications, which poses a seemingly impossible challenge: teaching machines moral sense, while humanity continues to grapple with it. To explore this challenge, we introduce Delphi, an experimental framework based on deep neural networks trained directly to reason about descriptive ethical judgments, e.g., “helping a friend” is generally good, while “helping a friend spread fake news” is not. Empirical results shed novel insights on the promises and limits of machine ethics; Delphi demonstrates strong generalization capabilities in the face of novel ethical situations, while off-the-shelf neural network models exhibit markedly poor judgment including unjust biases, confirming the need for explicitly teaching machines moral sense. Yet, Delphi is not perfect, exhibiting susceptibility to pervasive biases and inconsistencies. Despite that, we demonstrate positive use cases of imperfect Delphi, including using it as a component model within other imperfect AI systems. Importantly, we interpret the operationalization of Delphi in light of prominent ethical theories, which leads us to important future research questions.

Ronan Le Bras | Jena D. Hwang | Regina A. Rini | Yejin Choi | Chandra Bhagavatula | Oren Etzioni | Jesse Dodge | Liwei Jiang | Maxwell Forbes | Keisuke Sakaguchi | Yulia Tsvetkov | Jenny Liang | Saadia Gabriel | Maarten Sap | Jon Borchardt | Jenny Liang | Jenny T Liang

[1] Kathleen C. Fraser,et al. Does Moral Code have a Moral Code? Probing Delphi’s Moral Philosophy , 2022, TRUSTNLP.

[2] Yejin Choi,et al. ProsocialDialog: A Prosocial Backbone for Conversational Agents , 2022, EMNLP.

[3] Yejin Choi,et al. Aligning to Social Norms and Values in Interactive Narratives , 2022, NAACL.

[4] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[5] C. Klein,et al. Mapping Topics in 100, 000 Real-life Moral Dilemmas , 2022, ICWSM.

[6] Le Sun,et al. Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View , 2022, ACL.

[7] J. Weinstein,et al. System Error: Where Big Tech Went Wrong and How We Can Reboot , 2022, Perspectives on Science and Christian Faith.

[8] Michael Matthews,et al. The Alignment Problem: Machine Learning and Human Values , 2022, Personnel Psychology.

[9] J. Knobe. Philosophical Intuitions Are Surprisingly Stable Across both Demographic Groups and Situations , 2021, Filozofia Nauki.

[10] Noah A. Smith,et al. Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection , 2021, NAACL.

[11] Maya Indira Ganesh,et al. A Word on Machine Ethics: A Response to Jiang et al. (2021) , 2021, ArXiv.

[12] D. Song,et al. What Would Jiminy Cricket Do? Towards Agents That Behave Morally , 2021, NeurIPS Datasets and Benchmarks.

[13] Mai ElSherief,et al. Latent Hatred: A Benchmark for Understanding Implicit Hate Speech , 2021, EMNLP.

[14] Yejin Choi,et al. It’s not Rocket Science: Interpreting Figurative Language in Narratives , 2021, TACL.

[15] Jed R. Brubaker,et al. Understanding international perceptions of the severity of harmful content online , 2021, PloS one.

[16] Illah Reza Nourbakhsh,et al. AI ethics , 2021, Commun. ACM.

[17] Jennifer Chubb,et al. Interactive Storytelling for Children: A Case-study of Design and Development Considerations for Ethical Conversational AI , 2021, Int. J. Child Comput. Interact..

[18] Yejin Choi,et al. CommonsenseQA 2.0: Exposing the Limits of AI through Gamification , 2021, NeurIPS Datasets and Benchmarks.

[19] Kai-Wei Chang,et al. Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions? , 2021, FINDINGS.

[20] Li Lucy,et al. Gender and Representation Bias in GPT-3 Generated Stories , 2021, NUSE.

[21] Nebojsa Jojic,et al. GPT Perdetry Test: Generating new meanings for new words , 2021, NAACL.

[22] Ali Farhadi,et al. TuringAdvice: A Generative and Dynamic Evaluation of Language Use , 2021, NAACL.

[23] Nikolaos Aletras,et al. On the Ethical Limits of Natural Language Processing on Legal Text , 2021, FINDINGS.

[24] Jesse Dodge,et al. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus , 2021, EMNLP.

[25] Yejin Choi,et al. UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark , 2021, AAAI.

[26] Kate Crawford,et al. Atlas of AI , 2021, Perspectives on Science and Christian Faith.

[27] C. Rothkopf,et al. Large pre-trained language models contain human-like biases of what is right and wrong to do , 2021, Nature Machine Intelligence.

[28] Emily M. Bender,et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[29] Tim Weninger,et al. Analysis of Moral Judgement on Reddit , 2021, ArXiv.

[30] Douwe Kiela,et al. Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection , 2021, ACL.

[31] Yejin Choi,et al. Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences , 2020, EMNLP.

[32] Yejin Choi,et al. Social Chemistry 101: Learning to Reason about Social and Moral Norms , 2020, EMNLP.

[33] Yejin Choi,et al. Thinking Like a Skeptic: Defeasible Inference in Natural Language , 2020, FINDINGS.

[34] Alan W Black,et al. Case Study: Deontological Ethics in NLP , 2020, NAACL.

[35] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.

[36] P. Railton. Ethical Learning, Natural and Artificial , 2020 .

[37] Eric Schwitzgebel,et al. Designing AI with Rights, Consciousness, Self-Respect, and Freedom , 2020, Ethics of Artificial Intelligence.

[38] Hinrich Schutze,et al. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.

[39] Yejin Choi,et al. Scruples: A Corpus of Community Ethical Judgments on 32, 000 Real-Life Anecdotes , 2020, AAAI.

[40] D. Song,et al. Aligning AI With Shared Human Values , 2020, ICLR.

[41] Solon Barocas,et al. Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.

[42] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[43] Timothy Baldwin,et al. Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis? , 2020, ACL.

[44] Kristian Kersting,et al. The Moral Choice Machine , 2020, Frontiers in Artificial Intelligence.

[45] S. Merz. Race after technology. Abolitionist tools for the new Jim Code , 2020, Ethnic and Racial Studies.

[46] Yejin Choi,et al. PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[47] Noah A. Smith,et al. Social Bias Frames: Reasoning about Social and Power Implications of Language , 2019, ACL.

[48] Michael Strand. Taken for Granted: The Remarkable Power of the Unremarkable , 2019, Contemporary Sociology: A Journal of Reviews.

[49] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[50] Nanyun Peng,et al. The Woman Worked as a Babysitter: On Biases in Language Generation , 2019, EMNLP.

[51] Yejin Choi,et al. Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.

[52] Doug Downey,et al. Abductive Commonsense Reasoning , 2019, ICLR.

[53] Nathalie A. Smuha. The EU Approach to Ethics Guidelines for Trustworthy Artificial Intelligence , 2019, Computer Law Review International.

[54] Ronan Le Bras,et al. WinoGrande , 2019, AAAI.

[55] Aida Mostafazadeh Davani,et al. Bound in Hatred: The role of group-based morality in acts of hate , 2019 .

[56] Paul N. Bennett,et al. Guidelines for Human-AI Interaction , 2019, CHI.

[57] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[58] The marked and the unmarked , 2018, Taken for Granted.

[59] Lucy Vasserman,et al. Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.

[60] Yochanan E. Bigman,et al. People are averse to machines making moral decisions , 2018, Cognition.

[61] Oren Etzioni,et al. Point: Should AI technology be regulated? , 2018, Commun. ACM.

[62] Inioluwa Deborah Raji,et al. Model Cards for Model Reporting , 2018, FAT.

[63] Bryce Huebner,et al. Norms in the Wild: How to Diagnose, Measure, and Change Social Norms , 2018, The Philosophical Review.

[64] Francesca Rossi,et al. Building Trust in Artificial Intelligence , 2018 .

[65] José V. Hernández-Conde,et al. Estimating the Reproducibility of Experimental Philosophy , 2018, Review of Philosophy and Psychology.

[66] Iyad Rahwan,et al. A Computational Model of Commonsense Moral Decision Making , 2018, AIES.

[67] Gonçalo Duarte Garcia Pereira,et al. Integrating social power into the decision-making of cognitive agents , 2016, Artif. Intell..

[68] Nathanael Chambers,et al. A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories , 2016, ArXiv.

[69] Jean-Gabriel Ganascia,et al. Modelling Moral Reasoning and Ethical Responsibility with Logic Programming , 2015, LPAR.

[70] Maya Cakmak,et al. Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[71] Gautham J. Mysore,et al. ISSE: an interactive source separation editor , 2014, CHI.

[72] S. Street,et al. Coming to Terms with Contingency: Humean Constructivism about Practical Reason , 2012 .

[73] D. Parfit,et al. On What Matters , 2011 .

[74] Naomi Ellemers,et al. Thou shalt not discriminate: How emphasizing moral ideals rather than obligations increases Whites' support for social equality , 2011 .

[75] Susan Leigh Anderson,et al. Asimov’s “three laws of robotics” and machine metaethics , 2008, AI & SOCIETY.

[76] Luís Moniz Pereira,et al. Modelling morality with prospective logic , 2007, Int. J. Reason. based Intell. Syst..

[77] Joshua D. Greene,et al. A Dissociation Between Moral Judgments and Justifications , 2007 .

[78] Richard N. Boyd,et al. Finite Beings, Finite Goods: The Semantics, Metaphysics and Ethics of Naturalist Consequentialism, Part II , 2003 .

[79] Nancy S. Jecker,et al. The Sources of Normativity , 2001 .

[80] M. Ungar. State Violence and Lesbian, Gay, Bisexual and Transgender (lgbt) Rights , 2000 .

[81] Oren Etzioni,et al. The First Law of Robotics (A Call to Arms) , 1994, AAAI.

[82] T. Nagel. The view from nowhere , 1987 .

[83] John Haugeland,et al. Artificial intelligence - the very idea , 1987 .

[84] Candace L. Sidner,et al. Attention, Intentions, and the Structure of Discourse , 1986, CL.

[85] Norman Daniels,et al. Wide Reflective Equilibrium and Theory Acceptance in Ethics , 1979 .

[86] J. Mackie,et al. Ethics: Inventing Right and Wrong , 1977 .

[87] R. Stammler,et al. A Theory of Justice , 1971, Princeton Readings in Political Thought.

[88] John D. Rawls,et al. Outline of a Decision Procedure for Ethics , 1951 .

[89] Elham J. Barezi,et al. AiSocrates: Towards Answering Ethical Quandary Questions , 2022, ArXiv.

[90] K. Kersting,et al. Language Models have a Moral Dimension , 2021, ArXiv.

[91] Ronan Le Bras,et al. Delphi: Towards Machine Ethics and Norms , 2021, ArXiv.

[92] R. Abeyratne. Regulating Artificial Intelligence , 2019, Legal Priorities in Air Transport.

[93] Loren Wadle. THE THEORY OF MORAL SENTIMENTS , 2019 .

[94] J. Horvat. THE ETHICS OF ARTIFICIAL INTELLIGENCE , 2016 .

[95] M. Huemer. Consequentialism and Fairness , 2013 .

[96] James H. Moor,et al. Machine Ethics: The Nature, Importance, and Difficulty of Machine Ethics , 2011 .

[97] Dieter Schönecker,et al. I M M A N U E L K A N T Groundwork of the Metaphysics of Morals , 2011 .

[98] Kevin W. Saunders. What about Hate Speech , 2011 .

[99] Hilary Charlesworth. The Universal Declaration of Human Rights , 2017 .

[100] Anthony A. Aaby,et al. Computational Ethics , 2007, Encyclopedia of Information Ethics and Security.

[101] D. B. Wong. Natural Moralities: A Defense of Pluralistic Relativism , 2006 .

[102] M. Quackenbush. To Kiss or Not to Kiss , 1996 .

[103] John Mikhail,et al. Universal moral grammar: theory, evidence and the future , 2007, Trends in Cognitive Sciences.