论文信息 - Whose Opinions Do Language Models Reflect?

Whose Opinions Do Language Models Reflect?

Language models (LMs) are increasingly being used in open-ended contexts, where the opinions reflected by LMs in response to subjective queries can have a profound impact, both on user satisfaction, as well as shaping the views of society at large. In this work, we put forth a quantitative framework to investigate the opinions reflected by LMs -- by leveraging high-quality public opinion polls and their associated human responses. Using this framework, we create OpinionsQA, a new dataset for evaluating the alignment of LM opinions with those of 60 US demographic groups over topics ranging from abortion to automation. Across topics, we find substantial misalignment between the views reflected by current LMs and those of US demographic groups: on par with the Democrat-Republican divide on climate change. Notably, this misalignment persists even after explicitly steering the LMs towards particular demographic groups. Our analysis not only confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs, but also surfaces groups whose opinions are poorly reflected by current LMs (e.g., 65+ and widowed individuals). Our code and data are available at https://github.com/tatsu-lab/opinions_qa.

[1] Jochen Hartmann,et al. The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation , 2023, SSRN Electronic Journal.

[2] D. Wingate,et al. Out of One, Many: Using Language Models to Simulate Human Samples , 2022, Political Analysis.

[3] Eric Schulz,et al. Using cognitive psychology to understand GPT-3 , 2022, Proceedings of the National Academy of Sciences of the United States of America.

[4] Tom B. Brown,et al. Discovering Language Model Behaviors with Model-Written Evaluations , 2022, ACL.

[5] Tom B. Brown,et al. Constitutional AI: Harmlessness from AI Feedback , 2022, ArXiv.

[6] Christopher D. Manning,et al. Holistic Evaluation of Language Models , 2023, Annals of the New York Academy of Sciences.

[7] Lisa Anne Hendricks,et al. Improving alignment of dialogue agents via targeted human judgements , 2022, ArXiv.

[8] Gabriel Simmons. Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity , 2022, ArXiv.

[9] Doug Beeferman,et al. CommunityLM: Probing Partisan Worldviews from Language Models , 2022, COLING.

[10] Tom B. Brown,et al. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned , 2022, ArXiv.

[11] A. Kalai,et al. Using Large Language Models to Simulate Multiple Humans , 2022, ArXiv.

[12] Michael S. Bernstein,et al. Social Simulacra: Creating Populated Prototypes for Social Computing Systems , 2022, UIST.

[13] Ian Stewart,et al. Surfacing Racial Stereotypes through Identity Portrayal , 2022, FAccT.

[14] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[15] I. Kivlichan,et al. Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation , 2022, Proc. ACM Hum. Comput. Interact..