Whose Opinions Do Language Models Reflect?

Language models (LMs) are increasingly being used in open-ended contexts, where the opinions reflected by LMs in response to subjective queries can have a profound impact, both on user satisfaction, as well as shaping the views of society at large. In this work, we put forth a quantitative framework to investigate the opinions reflected by LMs -- by leveraging high-quality public opinion polls and their associated human responses. Using this framework, we create OpinionsQA, a new dataset for evaluating the alignment of LM opinions with those of 60 US demographic groups over topics ranging from abortion to automation. Across topics, we find substantial misalignment between the views reflected by current LMs and those of US demographic groups: on par with the Democrat-Republican divide on climate change. Notably, this misalignment persists even after explicitly steering the LMs towards particular demographic groups. Our analysis not only confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs, but also surfaces groups whose opinions are poorly reflected by current LMs (e.g., 65+ and widowed individuals). Our code and data are available at https://github.com/tatsu-lab/opinions_qa.

[1]  Jochen Hartmann,et al.  The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation , 2023, SSRN Electronic Journal.

[2]  D. Wingate,et al.  Out of One, Many: Using Language Models to Simulate Human Samples , 2022, Political Analysis.

[3]  Eric Schulz,et al.  Using cognitive psychology to understand GPT-3 , 2022, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Tom B. Brown,et al.  Discovering Language Model Behaviors with Model-Written Evaluations , 2022, ACL.

[5]  Tom B. Brown,et al.  Constitutional AI: Harmlessness from AI Feedback , 2022, ArXiv.

[6]  Christopher D. Manning,et al.  Holistic Evaluation of Language Models , 2023, Annals of the New York Academy of Sciences.

[7]  Lisa Anne Hendricks,et al.  Improving alignment of dialogue agents via targeted human judgements , 2022, ArXiv.

[8]  Gabriel Simmons Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity , 2022, ArXiv.

[9]  Doug Beeferman,et al.  CommunityLM: Probing Partisan Worldviews from Language Models , 2022, COLING.

[10]  Tom B. Brown,et al.  Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned , 2022, ArXiv.

[11]  A. Kalai,et al.  Using Large Language Models to Simulate Multiple Humans , 2022, ArXiv.

[12]  Michael S. Bernstein,et al.  Social Simulacra: Creating Populated Prototypes for Social Computing Systems , 2022, UIST.

[13]  Ian Stewart,et al.  Surfacing Racial Stereotypes through Identity Portrayal , 2022, FAccT.

[14]  Gerard de Melo,et al.  Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[15]  I. Kivlichan,et al.  Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation , 2022, Proc. ACM Hum. Comput. Interact..

[16]  Saketh Reddy Karra,et al.  AI Personification: Estimating the Personality of Language Models , 2022, ArXiv.

[17]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[18]  Geoffrey Irving,et al.  Red Teaming Language Models with Language Models , 2022, EMNLP.

[19]  Michael S. Bernstein,et al.  Jury Learning: Integrating Dissenting Voices into Machine Learning Models , 2022, CHI.

[20]  Noah A. Smith,et al.  Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection , 2021, NAACL.

[21]  Vinodkumar Prabhakaran,et al.  Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations , 2021, TACL.

[22]  Dario Amodei,et al.  A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.

[23]  Dongwon Lee,et al.  TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation , 2021, EMNLP.

[24]  Jason Weston,et al.  Bot-Adversarial Dialogue for Safe Conversational Agents , 2021, NAACL.

[25]  Michael S. Bernstein,et al.  The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality , 2021, CHI.

[26]  Ting Yan Consequences of Asking Sensitive Questions in Surveys , 2021 .

[27]  Kai-Wei Chang,et al.  BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation , 2021, FAccT.

[28]  Dawn Song,et al.  Measuring Massive Multitask Language Understanding , 2020, ICLR.

[29]  Yejin Choi,et al.  Scruples: A Corpus of Community Ethical Judgments on 32, 000 Real-Life Anecdotes , 2020, AAAI.

[30]  Siva Reddy,et al.  StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.

[31]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[32]  Ellie Pavlick,et al.  Inherent Disagreements in Human Textual Inferences , 2019, Transactions of the Association for Computational Linguistics.

[33]  Alexandra Chouldechova,et al.  Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting , 2019, FAT.

[34]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[35]  Adam J. Berinsky,et al.  Measuring Public Opinion with Surveys , 2017 .

[36]  Cecilia Ovesdotter Alm Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications , 2011, ACL.

[37]  J. Henrich,et al.  The weirdest people in the world? , 2010, Behavioral and Brain Sciences.

[38]  Willem E. Saris,et al.  Studies in public opinion : attitudes, nonattitudes, measurement error, and change , 2004 .

[39]  C. B. Colby The weirdest people in the world , 1973 .