Personality Traits on Twitter—or—How to Get 1,500 Personality Tests in a Week

Psychology research suggests that certain personality traits correlate with linguistic behavior. This correlation can be effectively modeled with statistical natural language processing techniques. Prediction accuracy generally improves with larger data samples, which also allows for more lexical features. Most existing work on personality prediction, however, focuses on small samples and closed-vocabulary investigations. Both factors limit the generality and statistical power of the results. In this paper, we explore the use of social media as a resource for large-scale, open- vocabulary personality detection. We analyze which features are predictive of which personality traits, and present a novel corpus of 1.2M English tweets annotated with Myers-Briggs personality type and gender. Our experiments show that social media data can provide sufficient linguistic evidence to reliably predict two of four personality dimensions.

[1]  Fabio Pianesi,et al.  The Workshop on Computational Personality Recognition 2014 , 2014, ACM Multimedia.

[2]  Gregory J. Park,et al.  Automatic personality assessment through social media language. , 2015, Journal of personality and social psychology.

[3]  Myers,et al.  Gifts Differing: Understanding Personality Type , 1980 .

[4]  Fabio Pianesi,et al.  Workshop on Computational Personality Recognition: Shared Task , 2013, Proceedings of the International AAAI Conference on Web and Social Media.

[5]  Fabio Celli,et al.  The Effect of Personality Type on Deceptive Communication Style , 2013, 2013 European Intelligence and Security Informatics Conference.

[6]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[7]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[8]  Michal Kosinski,et al.  Mining Facebook Data for Predictive Personality Modeling , 2013, Proceedings of the International AAAI Conference on Web and Social Media.

[9]  Walter Daelemans,et al.  CLiPS Stylometry Investigation (CSI) corpus: A Dutch corpus for the detection of age, gender, personality, sentiment and deception in text , 2014, LREC.

[10]  Glen Coppersmith,et al.  Quantifying the Language of Schizophrenia in Social Media , 2015, CLPsych@HLT-NAACL.

[11]  Eduardo Blanco,et al.  Toward Personality Insights from Language Exploration in Social Media , 2013, AAAI Spring Symposium: Analyzing Microtext.

[12]  Dirk Hovy,et al.  Cross-lingual syntactic variation over age and gender , 2015, CoNLL.

[13]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[14]  David Yarowsky,et al.  Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media , 2013, EMNLP.

[15]  Svitlana Volkova,et al.  Inferring Latent User Properties from Texts Published in Social Media , 2015, AAAI.

[16]  Philip S. Yu,et al.  Empirical Evaluation of Profile Characteristics for Gender Classification on Twitter , 2013, 2013 12th International Conference on Machine Learning and Applications.

[17]  Carolyn Penstein Rosé,et al.  Author Age Prediction from Text using Linear Regression , 2011, LaTeCH@ACL.

[18]  Sara Rosenthal,et al.  Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations , 2011, ACL.

[19]  A. Tellegen,et al.  PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES An Alternative "Description of Personality": The Big-Five Factor Structure , 2022 .

[20]  J. Henrich,et al.  The weirdest people in the world? , 2010, Behavioral and Brain Sciences.

[21]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[22]  C. B. Colby The weirdest people in the world , 1973 .

[23]  Jr. John E. Barbuto,et al.  A Critique of the Myers-Briggs Type Indicator and its Operationalization of Carl Jung's Psychological Types , 1997 .

[24]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[25]  Valerie Priscilla Goby,et al.  Personality and Online/Offline Choices: MBTI Profiles and Favored Communication Modes in a Singapore Study , 2006, Cyberpsychology Behav. Soc. Netw..

[26]  Scott Nowson,et al.  Look! Who's Talking?: Projection of Extraversion Across Different Social Contexts , 2014, WCPR '14.

[27]  Maarten Sap,et al.  The role of personality, age, and gender in tweeting about mental illness , 2015, CLPsych@HLT-NAACL.

[28]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[29]  A. Furnham The big five versus the big four: the relationship between the Myers-Briggs Type Indicator (MBTI) and NEO-PI five factor model of personality , 1996 .

[30]  Dirk Hovy,et al.  Crowdsourcing and annotating NER for Twitter #drift , 2014, LREC.

[31]  Derek Ruths,et al.  Gender Inference of Twitter Users in Non-English Contexts , 2013, EMNLP.

[32]  Walter Daelemans,et al.  Personae: a Corpus for Author and Personality Prediction from Text , 2008, LREC.