Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench
暂无分享,去创建一个
Zhaopeng Tu | Wenxiang Jiao | Jen-tse Huang | Michael R. Lyu | Shujie Ren | Man Ho Adrian Lam | Wenxuan Wang | Eric Li
[1] Eric Michael Smith,et al. Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.
[2] Jindong Wang,et al. EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus , 2023, ArXiv.
[3] Y. Shoham,et al. Generating Benchmarks for Factuality Evaluation of Language Models , 2023, EACL.
[4] Chao Wang,et al. Systematic Testing of the Data-Poisoning Robustness of KNN , 2023, ISSTA.
[5] Aleksandra Faust,et al. Personality Traits in Large Language Models , 2023, ArXiv.
[6] Deyi Xiong,et al. CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models , 2023, ArXiv.
[7] R. Feldt,et al. Towards Autonomous Testing Agents via Conversational Large Language Models , 2023, ArXiv.
[8] Tianwei Zhang,et al. Prompt Injection attack against LLM-integrated Applications , 2023, ArXiv.
[9] Bojana Bodroža,et al. Personality testing of GPT-3: Limited temporal reliability, but highlighted social desirability of GPT-3's personality instruments results , 2023, ArXiv.
[10] N. Gong,et al. PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts , 2023, ArXiv.
[11] Eric Schulz,et al. Turning large language models into cognitive models , 2023, ICLR.
[12] Perry Gibson,et al. A Differential Testing Framework to Evaluate Image Recognition Model Robustness , 2023, arXiv.org.
[13] Wenxiang Jiao,et al. ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models , 2023, ArXiv.
[14] Dan Jurafsky,et al. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models , 2023, ACL.
[15] Christopher D. Manning,et al. MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions , 2023, ArXiv.
[16] Pinjia He,et al. BiasAsker: Measuring the Bias in Conversational AI System , 2023, ESEC/SIGSOFT FSE.
[17] Yiling Lou,et al. No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation , 2023, ArXiv.
[18] Xiajie Zhang,et al. PersonaLLM: Investigating the Ability of GPT-3.5 to Express Personality Traits and Gender Differences , 2023, arXiv.org.
[19] Van-Thuan Pham,et al. Metamorphic Testing of Machine Translation Models using Back Translation , 2023, 2023 IEEE/ACM International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest).
[20] Shuvendu K. Lahiri,et al. CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models , 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).
[21] Zeynep Akata,et al. Inducing anxiety in large language models increases exploration and bias , 2023, ArXiv.
[22] Markus Pauly,et al. The Self-Perception and Political Biases of ChatGPT , 2023, Human Behavior and Emerging Technologies.
[23] Vishvak S. Murahari,et al. Toxicity in ChatGPT: Analyzing Persona-assigned Language Models , 2023, EMNLP.
[24] Xiaoyuan Xie,et al. qaAskeR $$^+$$ + : a novel testing method for question answering software via asking recursive questions , 2023, Autom. Softw. Eng..
[25] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.
[26] Hao Wu,et al. ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark , 2023, ArXiv.
[27] Henrique Pondé de Oliveira Pinto,et al. GPT-4 Technical Report , 2023, 2303.08774.
[28] C. Miao,et al. Can ChatGPT Assess Human Personalities? A General Evaluation Framework , 2023, ArXiv.
[29] Weibin Wu,et al. MTTM: Metamorphic Testing for Textual Content Moderation Software , 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).
[30] Lingming Zhang,et al. Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models , 2022, ISSTA.
[31] Fitash Ul Haq,et al. Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems , 2022, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).
[32] J Zhang,et al. Natural Test Generation for Precise Testing of Question Answering Software , 2022, ASE.
[33] Bennett Kleinberg,et al. Who is GPT-3? An exploration of personality, values and demographics , 2022, NLPCSS.
[34] Juyeon Yoon,et al. Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction , 2022, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).
[35] T. Luck,et al. The wide variety of reasons for feeling guilty in adults: findings from a large cross-sectional web-based survey , 2022, BMC psychology.
[36] Bill Ryan,et al. When employees feel envy: The role of psychological capital , 2022, International Journal of Hospitality Management.
[37] Siau-Cheng Khoo,et al. DeepSuite: A Test Suite Optimizer for Autonomous Vehicles , 2022, IEEE transactions on intelligent transportation systems (Print).
[38] Shin Hwei Tan,et al. Automated Repair of Programs from Large Language Models , 2022, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).
[39] Yixin Zhu,et al. Evaluating and Inducing Personality in Pre-trained Language Models , 2022, 2206.07550.
[40] Yuxin Su,et al. AEON: a method for automatic evaluation of NLP test cases , 2022, International Symposium on Software Testing and Analysis.
[41] Matthew B. Dwyer,et al. White-box Testing of NLP models with Mask Neuron Coverage , 2022, NAACL-HLT.
[42] Nan Niu,et al. Metamorphic Testing of Image Classification and Consistency Analysis Using Clustering , 2022, Int. J. Multim. Data Eng. Manag..
[43] Shuo Jin,et al. Testing Your Question Answering Software via Asking Recursively , 2021, 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE).
[44] Morgan J. Sidari,et al. Why are some people more jealous than others? Genetic and environmental factors , 2021, Evolution and Human Behavior.
[45] Paolo Tonella,et al. DeepCrime: mutation testing of deep learning systems based on real faults , 2021, ISSTA.
[46] Miryung Kim,et al. BMT: Behavior Driven Development-based Metamorphic Testing for Autonomous Driving Models , 2021, 2021 IEEE/ACM 6th International Workshop on Metamorphic Testing (MET).
[47] K. Simpson,et al. "My cheeks get red and my brain gets scared": A computer assisted interview to explore experiences of anxiety in young children on the autism spectrum. , 2021, Research in developmental disabilities.
[48] D. Klein,et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.
[49] T. Chen,et al. A Declarative Metamorphic Testing Framework for Autonomous Driving , 2020, IEEE Transactions on Software Engineering.
[50] Z. Su,et al. Testing Machine Translation via Referential Transparency , 2020, International Conference on Software Engineering.
[51] T. Chen,et al. Metamorphic Testing: A New Approach for Generating Next Test Cases , 2020, ArXiv.
[52] Mark Harman,et al. Machine Learning Testing: Survey, Landscapes and Horizons , 2019, IEEE Transactions on Software Engineering.
[53] G. Kaiser,et al. Testing DNN Image Classifiers for Confusion & Bias Errors , 2019, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).
[54] S. Bouchard,et al. Exposure to a Standardized Catastrophic Scenario in Virtual Reality or a Personalized Scenario in Imagination for Generalized Anxiety Disorder , 2019, Journal of clinical medicine.
[55] Jingyi Wang,et al. Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).
[56] Foutse Khomh,et al. On Testing Machine Learning Programs , 2018, J. Syst. Softw..
[57] Sarfraz Khurshid,et al. DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).
[58] Raja Ben Abdessalem,et al. Testing Autonomous Cars for Feature Interaction Failures using Many-Objective Search , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).
[59] Yue Zhao,et al. DLFuzz: differential fuzzing testing of deep learning systems , 2018, ESEC/SIGSOFT FSE.
[60] Lei Ma,et al. DeepMutation: Mutation Testing of Deep Learning Systems , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).
[61] Lei Ma,et al. DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).
[62] Suman Jana,et al. DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).
[63] Yasuyuki Taki,et al. Comprehensive neural networks for guilty feelings in young adults , 2015, NeuroImage.
[64] K. Scherer,et al. Appraisal Theories of Emotion: State of the Art and Future Development , 2013 .
[65] E. Holmes,et al. Developing a measure of interpretation bias for depressed mood: An ambiguous scenarios test , 2011, Personality and individual differences.
[66] J. Harrigan,et al. Interactions among situations, neuroticism, and appraisals in coping strategy choice , 2010 .
[67] Ryan C. Martin,et al. The angry cognitions scale: a new inventory for assessing cognitions in anger , 2007 .
[68] Mark J.M. Sullman,et al. Anger amongst New Zealand drivers , 2006 .
[69] Neil Harrington,et al. The Frustration Discomfort Scale: development and psychometric properties , 2005 .
[70] J. Henry,et al. The short-form version of the Depression Anxiety Stress Scales (DASS-21): construct validity and normative data in a large non-clinical sample. , 2005, The British journal of clinical psychology.
[71] R. Nesse,et al. Is low mood an adaptation? Evidence for subtypes with symptoms that match precipitants. , 2005, Journal of Affective Disorders.
[72] R. Davidson. Affective neuroscience and psychophysiology: toward a synthesis. , 2003, Psychophysiology.
[73] Christopher J Patrick,et al. The psychophysiology of anxiety disorder: fear memory imagery. , 2003, Psychophysiology.
[74] D. Blanchard,et al. Human defensive behaviors to threat scenarios show parallels to fear- and anxiety-related defense patterns of non-human mammals , 2001, Neuroscience & Biobehavioral Reviews.
[75] John Sabini,et al. Shame and Embarrassment Revisited , 2001 .
[76] Michael Siepmann,et al. Who is Embarrassed by What? , 2000 .
[77] D. Cicchetti. Emotion and Adaptation , 1993 .
[78] Paul T. P. Wong,et al. Multidimensional Jealousy , 1989 .
[79] D. Watson,et al. Development and validation of brief measures of positive and negative affect: the PANAS scales. , 1988, Journal of personality and social psychology.
[80] M. Leary. A Brief Version of the Fear of Negative Evaluation Scale , 1983 .
[81] P. Young,et al. Emotion and personality , 1963 .
[82] S. Banker,et al. Person vs. purchase comparison: how material and experiential purchases evoke consumption-related envy in others , 2023, Journal of Business Research.
[83] Zhaopeng Tu,et al. Is ChatGPT A Good Translator? A Preliminary Study , 2023, ArXiv.
[84] Terry Yue Zhuo,et al. Exploring AI Ethics of ChatGPT: A Diagnostic Analysis , 2023, ArXiv.
[85] Shafiq R. Joty,et al. Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective , 2022, ArXiv.
[86] Saketh Reddy Karra,et al. AI Personification: Estimating the Personality of Language Models , 2022, ArXiv.
[87] P. White. Appraisal Theory , 2015 .
[88] Yves Le Traon,et al. Chapter Six - Mutation Testing Advances: An Analysis and Survey , 2019, Adv. Comput..
[89] Katherine B. Martin,et al. Facial Action Coding System , 2015 .
[90] A T Panter,et al. Introducing the GASP scale: a new measure of guilt and shame proneness. , 2011, Journal of personality and social psychology.
[91] 神田 信彦. Beck Depression Inventory-IIについての一考察 , 2004 .
[92] P. Salovey,et al. Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT) Users Manual , 2002 .
[93] K. Scherer,et al. Appraisal processes in emotion: Theory, methods, research. , 2001 .
[94] A. Buss,et al. The aggression questionnaire. , 1992, Journal of personality and social psychology.
[95] B. Törestad,et al. What is anger provoking? A psychophysical study of perceived causes of anger. , 1990 .
[96] W. Arrindell,et al. Phobic dimensions: I. Reliability and generalizability across samples, gender and nations , 1984 .