Personality testing of GPT-3: Limited temporal reliability, but highlighted social desirability of GPT-3's personality instruments results

To assess the potential applications and limitations of chatbot GPT-3 Davinci-003, this study explored the temporal reliability of personality questionnaires applied to the chatbot and its personality profile. Psychological questionnaires were administered to the chatbot on two separate occasions, followed by a comparison of the responses to human normative data. The findings revealed varying levels of agreement in the chatbot's responses over time, with some scales displaying excellent while others demonstrated poor agreement. Overall, Davinci-003 displayed a socially desirable and pro-social personality profile, particularly in the domain of communion. However, the underlying basis of the chatbot's responses, whether driven by conscious self-reflection or predetermined algorithms, remains uncertain.

[1]  M. Kosinski Theory of Mind Might Have Spontaneously Emerged in Large Language Models , 2023, 2302.02083.

[2]  Dana Kulzhabayeva,et al.  Exploring The Design of Prompts For Applying GPT-3 based Chatbots: A Mental Wellbeing Case Study on Mechanical Turk , 2022, ArXiv.

[3]  A. Følstad,et al.  A longitudinal study of human-chatbot relationships , 2022, Int. J. Hum. Comput. Stud..

[4]  Eric Schulz,et al.  Using cognitive psychology to understand GPT-3 , 2022, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Zhe Gan,et al.  An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA , 2021, AAAI.

[6]  Robert Dale,et al.  GPT-3: What’s it good for? , 2020, Natural Language Engineering.

[7]  Luciano Floridi,et al.  GPT-3: Its Nature, Scope, Limits, and Consequences , 2020, Minds and Machines.

[8]  Pascale Fung,et al.  CAiRE: An End-to-End Empathetic Chatbot , 2019, AAAI.

[9]  M. Ashton,et al.  Psychometric Properties of the HEXACO-100 , 2018, Assessment.

[10]  H. Chabrol,et al.  The Dark Tetrad and Antisocial Behavior in a Community Sample of College Students , 2017 .

[11]  John T. Jost,et al.  Ideological Asymmetries and the Essence of Political Psychology , 2017 .

[12]  Terry K Koo,et al.  A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. , 2016, Journal Chiropractic Medicine.

[13]  Kate A. Barford,et al.  Mapping the interpersonal domain: Translating between the Big Five, HEXACO, and Interpersonal Circumplex , 2015 .

[14]  Cecilia Wong,et al.  The Conscious Mind , 2015, Leonardo.

[15]  B. Hilbig,et al.  Dishonest responding or true virtue? A behavioral test of impression management , 2015 .

[16]  D. Paulhus,et al.  The Bidimensional Impression Management Index (BIMI): Measuring Agentic and Communal Forms of Impression Management , 2014, Journal of personality assessment.

[17]  Daniel N. Jones,et al.  Introducing the Short Dark Triad (SD3) , 2014, Assessment.

[18]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[19]  Beryl Hesketh,et al.  Applicants faking good: evidence of item bias in the NEO PI-R , 2004 .

[20]  D. Paulhus,et al.  The Dark Triad of personality: Narcissism, Machiavellianism, and psychopathy , 2002 .

[21]  Daniel C. Dennett,et al.  The Rediscovery of the Mind , 1992, Artif. Intell..

[22]  Charles S. Carver,et al.  The Self-Consciousness Scale: A revised version for use with general populations. , 1985 .

[23]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[24]  Ljubiša Bojić,et al.  Signs of Consciousness in Ai: Can Gpt-3 Tell How Smart it Really is? , 2023, SSRN Electronic Journal.

[25]  Shafiq R. Joty,et al.  Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective , 2022, ArXiv.

[26]  Chae-Gyun Lim,et al.  Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation , 2022, COLING.

[27]  James H. Cumming CONSCIOUSNESS EXPLAINED? , 2022 .

[28]  O. John,et al.  The Next Big Five Inventory (BFI-2): Developing and Assessing a Hierarchical Model With 15 Facets to Enhance Bandwidth, Fidelity, and Predictive Power , 2017, Journal of personality and social psychology.

[29]  A. Book,et al.  Unpacking “evil”: Claiming the core of the Dark Triad , 2015 .

[30]  Delroy L. Paulhus,et al.  Self-presentation of personality: An agency-communion framework. , 2008 .

[31]  C. Koch The quest for consciousness : a neurobiological approach , 2004 .

[32]  F. Gregory The conscious mind: In search of a fundamental theory , 1998 .

[33]  C. Lantz Application and evaluation of the kappa statistic in the design and interpretation of chiropractic clinical research. , 1997, Journal of manipulative and physiological therapeutics.

[34]  D. Cicchetti,et al.  Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. , 1981, American journal of mental deficiency.