Predicting Responses to Psychological Questionnaires from Participants’ Social Media Posts and Question Text Embeddings

Psychologists routinely assess people’s emotions and traits, such as their personality, by collecting their responses to survey questionnaires. Such assessments can be costly in terms of both time and money, and often lack generalizability, as existing data cannot be used to predict responses for new survey questions or participants. In this study, we propose a method for predicting a participant’s questionnaire response using their social media texts and the text of the survey question they are asked. Specifically, we use Natural Language Processing (NLP) tools such as BERT embeddings to represent both participants (via the text they write) and survey questions as embeddings vectors, allowing us to predict responses for out-of-sample participants and questions. Our novel approach can be used by researchers to integrate new participants or new questions into psychological studies without the constraint of costly data collection, facilitating novel practical applications and furthering the development of psychological theory. Finally, as a side contribution, the success of our model also suggests a new approach to study survey questions using NLP tools such as text embeddings rather than response data used in traditional methods.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  D. Cook,et al.  Current concepts in validity and reliability for psychometric instruments: theory and application. , 2006, The American journal of medicine.

[3]  Manoochehr Azkhosh,et al.  Five Factor Model in Iranian Culture: A Psychometrics Analysis of NEO-Five Factor Inventory (NEO-FFI) , 2014 .

[4]  Steven Skiena,et al.  Latent human traits in the language of social media: An open-vocabulary approach , 2017, PloS one.

[5]  John A. Johnson,et al.  The international personality item pool and the future of public-domain personality measures ☆ , 2006 .

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  J. Tendeiro,et al.  Detecting careless respondents in web-based questionnaires: Which method to use? , 2016 .

[8]  Mark Dredze,et al.  Using Author Embeddings to Improve Tweet Stance Classification , 2018, NUT@EMNLP.

[9]  P. Costa,et al.  The revised NEO personality inventory (NEO-PI-R) , 2008 .

[10]  S. Gosling,et al.  Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. , 2015, The American psychologist.

[11]  A. Caspi,et al.  The Power of Personality: The Comparative Validity of Personality Traits, Socioeconomic Status, and Cognitive Ability for Predicting Important Life Outcomes , 2007, Perspectives on psychological science : a journal of the Association for Psychological Science.

[12]  van de Mortel,et al.  Faking it: Social desirability response bias in self-report research , 2008 .

[13]  Wai Ming To,et al.  Content Analysis of Social Media: A Grounded Theory Approach , 2015 .

[14]  C. A. Higgins,et al.  THE BIG FIVE PERSONALITY TRAITS, GENERAL MENTAL ABILITY, AND CAREER SUCCESS ACROSS THE LIFE SPAN , 1999 .

[15]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[16]  Shlomo Argamon,et al.  Mining the Blogosphere: Age, gender and the varieties of self-expression , 2007, First Monday.

[17]  L. Eyde,et al.  Psychological testing and psychological assessment. A review of evidence and issues. , 2001, The American psychologist.

[18]  Alan M. Sear,et al.  Questionnaire Response Rate: A Methodological Analysis , 1969 .

[19]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[20]  L. R. Goldberg The structure of phenotypic personality traits. , 1993, The American psychologist.

[21]  Jorge Cadima,et al.  Principal component analysis: a review and recent developments , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[22]  Trivellore E. Raghunathan,et al.  A Split Questionnaire Survey Design , 1995 .

[23]  Scott Sanner,et al.  Social collaborative filtering for cold-start recommendations , 2014, RecSys '14.

[24]  Jennifer Golbeck,et al.  Predicting Personality from Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[25]  Gregory J. Park,et al.  Automatic personality assessment through social media language. , 2015, Journal of personality and social psychology.

[26]  L. Crocker,et al.  Introduction to Classical and Modern Test Theory , 1986 .

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Galip Aydin,et al.  user2Vec: Social Media User Representation Based on Distributed Document Embeddings , 2019, 2019 International Artificial Intelligence and Data Processing Symposium (IDAP).