Whose Opinions Do Language Models Reflect?
暂无分享,去创建一个
[1] Jochen Hartmann,et al. The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation , 2023, SSRN Electronic Journal.
[2] D. Wingate,et al. Out of One, Many: Using Language Models to Simulate Human Samples , 2022, Political Analysis.
[3] Eric Schulz,et al. Using cognitive psychology to understand GPT-3 , 2022, Proceedings of the National Academy of Sciences of the United States of America.
[4] Tom B. Brown,et al. Discovering Language Model Behaviors with Model-Written Evaluations , 2022, ACL.
[5] Tom B. Brown,et al. Constitutional AI: Harmlessness from AI Feedback , 2022, ArXiv.
[6] Christopher D. Manning,et al. Holistic Evaluation of Language Models , 2023, Annals of the New York Academy of Sciences.
[7] Lisa Anne Hendricks,et al. Improving alignment of dialogue agents via targeted human judgements , 2022, ArXiv.
[8] Gabriel Simmons. Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity , 2022, ArXiv.
[9] Doug Beeferman,et al. CommunityLM: Probing Partisan Worldviews from Language Models , 2022, COLING.
[10] Tom B. Brown,et al. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned , 2022, ArXiv.
[11] A. Kalai,et al. Using Large Language Models to Simulate Multiple Humans , 2022, ArXiv.
[12] Michael S. Bernstein,et al. Social Simulacra: Creating Populated Prototypes for Social Computing Systems , 2022, UIST.
[13] Ian Stewart,et al. Surfacing Racial Stereotypes through Identity Portrayal , 2022, FAccT.
[14] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.
[15] I. Kivlichan,et al. Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation , 2022, Proc. ACM Hum. Comput. Interact..
[16] Saketh Reddy Karra,et al. AI Personification: Estimating the Personality of Language Models , 2022, ArXiv.
[17] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[18] Geoffrey Irving,et al. Red Teaming Language Models with Language Models , 2022, EMNLP.
[19] Michael S. Bernstein,et al. Jury Learning: Integrating Dissenting Voices into Machine Learning Models , 2022, CHI.
[20] Noah A. Smith,et al. Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection , 2021, NAACL.
[21] Vinodkumar Prabhakaran,et al. Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations , 2021, TACL.
[22] Dario Amodei,et al. A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.
[23] Dongwon Lee,et al. TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation , 2021, EMNLP.
[24] Jason Weston,et al. Bot-Adversarial Dialogue for Safe Conversational Agents , 2021, NAACL.
[25] Michael S. Bernstein,et al. The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality , 2021, CHI.
[26] Ting Yan. Consequences of Asking Sensitive Questions in Surveys , 2021 .
[27] Kai-Wei Chang,et al. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation , 2021, FAccT.
[28] Dawn Song,et al. Measuring Massive Multitask Language Understanding , 2020, ICLR.
[29] Yejin Choi,et al. Scruples: A Corpus of Community Ethical Judgments on 32, 000 Real-Life Anecdotes , 2020, AAAI.
[30] Siva Reddy,et al. StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.
[31] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[32] Ellie Pavlick,et al. Inherent Disagreements in Human Textual Inferences , 2019, Transactions of the Association for Computational Linguistics.
[33] Alexandra Chouldechova,et al. Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting , 2019, FAT.
[34] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.
[35] Adam J. Berinsky,et al. Measuring Public Opinion with Surveys , 2017 .
[36] Cecilia Ovesdotter Alm. Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications , 2011, ACL.
[37] J. Henrich,et al. The weirdest people in the world? , 2010, Behavioral and Brain Sciences.
[38] Willem E. Saris,et al. Studies in public opinion : attitudes, nonattitudes, measurement error, and change , 2004 .
[39] C. B. Colby. The weirdest people in the world , 1973 .