Promises and Perils of Inferring Personality on GitHub

Background: Personality plays a pivotal role in our understanding of human actions and behavior. Today, the applications of personality are widespread, built on the solutions from psychology to infer personality. Aim: In software engineering, for instance, one widely used solution to infer personality uses textual communication data. As studies on personality in software engineering continue to grow, it is imperative to understand the performance of these solutions. Method: This paper compares the inferential ability of three widely studied text-based personality tests against each other and the ground truth on GitHub. We explore the challenges and potential solutions to improve the inferential ability of personality tests. Results: Our study shows that solutions for inferring personality are far from being perfect. Software engineering communications data can infer individual developer personality with an average error rate of 41%. In the best case, the error rate can be reduced up to 36% by following our recommendations1.

[1]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[2]  Jesse Hoey,et al.  Effects of Personality Traits on Pull Request Acceptance , 2021, IEEE Transactions on Software Engineering.

[3]  van Mil Promises and Perils of Inferring Personality on GitHub , 2021 .

[4]  Ayushi Rastogi,et al.  On the Shoulders of Giants: A New Dataset for Pull-based Development Research , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[5]  L. Carroll Big Five Personality Traits , 2020, Encyclopedia of Education and Information Technologies.

[6]  Rahul N. Iyer,et al.  Personality Traits of GitHub Maintainers and Their Effects on Project Success , 2020 .

[7]  Andry Alamsyah,et al.  A Progress on the Personality Measurement Model using Ontology based on Social Media Text , 2019, 2019 International Conference on Information Management and Technology (ICIMTech).

[8]  Filippo Lanubile,et al.  A large-scale, in-depth analysis of developers' personalities in the Apache ecosystem , 2019, Inf. Softw. Technol..

[9]  Martijn Schoonvelde,et al.  Friends with text as data benefits: Assessing and extending the use of automated text analysis in political science and political psychology , 2019, Journal of Social and Political Psychology.

[10]  Maurizio Morisio,et al.  TwitPersonality: Computing Personality Traits from Tweets Using Word Embeddings and Supervised Learning , 2018, Inf..

[11]  Filippo Lanubile,et al.  On Developers' Personality in Large-Scale Distributed Projects: The Case of the Apache Ecosystem , 2018, 2018 IEEE/ACM 13th International Conference on Global Software Engineering (ICGSE).

[12]  Jalal Mahmud,et al.  25 Tweets to Know You: A New Model to Predict Personality with Social Media , 2017, ICWSM.

[13]  P. Kajonius Cross-cultural personality differences between East Asia and Northern Europe in IPIP-NEO , 2017 .

[14]  Matthew J. Schneider,et al.  Mean centering helps alleviate “micro” but not “macro” multicollinearity , 2016, Behavior research methods.

[15]  Nachiappan Nagappan,et al.  On the Personality Traits of GitHub Contributors , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[16]  Yi-Shin Chen,et al.  Subconscious Crowdsourcing: A feasible data collection mechanism for mental disorder detection on social media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[17]  Shuib Basri,et al.  Balancing the personality of programmer: software development team composition , 2016 .

[18]  Fabio A. González,et al.  Finding Relationships between Socio-Technical Aspects and Personality Traits by Mining Developer E-mails , 2016, 2016 IEEE/ACM Cooperative and Human Aspects of Software Engineering (CHASE).

[19]  B. Alansari The Big Five Inventory (BFI): Reliability and validity of its Arabic translation in non clinical sample , 2016, European Psychiatry.

[20]  Mohd Heikal Husin,et al.  Sentiment Valences for Automatic Personality Detection of Online Social Networks Users Using Three Factor Model , 2015 .

[21]  Natalia Juristo Juzgado,et al.  Are team personality and climate related to satisfaction and software quality? Aggregating results from a twice replicated experiment , 2015, Inf. Softw. Technol..

[22]  M. Martens,et al.  Application of Generalizability Theory to the Big Five Inventory. , 2014, Personality and individual differences.

[23]  Lefteris Angelis,et al.  Personality, emotional intelligence and work preferences in software engineering: An empirical study , 2014, Inf. Softw. Technol..

[24]  Jeffrey T. Hancock,et al.  Experimental evidence of massive-scale emotional contagion through social networks , 2014, Proceedings of the National Academy of Sciences.

[25]  Ong Choon Hee Validity and Reliability of the Big Five Personality Traits Scale in Malaysia , 2014 .

[26]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[27]  Daria Mizza The First Language (L1) or Mother Tongue Model Vs. The Second Language (L2) Model of Literacy Instruction , 2014 .

[28]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[29]  Fabio Q. B. da Silva,et al.  Team building criteria in software projects: A mix-method replicated study , 2013, Inf. Softw. Technol..

[30]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[31]  Dear Mr Sotiropoulos ARTICLE 29 Data Protection Working Party , 2013 .

[32]  Alberto Bacchelli,et al.  Content classification of development emails , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[33]  Jennifer Golbeck,et al.  Predicting Personality from Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[34]  Jennifer Golbeck,et al.  Predicting personality with social media , 2011, CHI Extended Abstracts.

[35]  Andrea Fossati,et al.  The Big Five Inventory (BFI) , 2011 .

[36]  Tal Yarkoni Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. , 2010, Journal of research in personality.

[37]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[38]  Patrick Dattalo,et al.  Statistical Power Analysis , 2008 .

[39]  O. John,et al.  Paradigm shift to the integrative Big Five trait taxonomy: History, measurement, and conceptual issues. , 2008 .

[40]  P. Costa,et al.  The revised NEO personality inventory (NEO-PI-R) , 2008 .

[41]  Ahmed E. Hassan,et al.  What Can OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of the Apache Developer Mailing List , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[42]  R. McCrae,et al.  The Geographic Distribution of Big Five Personality Traits , 2007 .

[43]  A. S. Sodiya,et al.  An Improved Assessment of Personality Traits in Software Engineering , 2007 .

[44]  Tjai M. Nielsen,et al.  Group Personality Composition and Group Effectiveness , 2005 .

[45]  R. L. Dipboye,et al.  Discrimination at Work : The Psychological and Organizational Bases , 2005 .

[46]  Kenneth W. Green,et al.  Doing Survey Research on the Internet: Yes, Timing Does Matter , 2004, J. Comput. Inf. Syst..

[47]  J. Pennebaker,et al.  The Secret Life of Pronouns , 2003, Psychological science.

[48]  R. McCrae,et al.  The Five-Factor Model of Personality Across Cultures , 2002 .

[49]  J. Rolland The Cross-Cultural Generalizability of the Five-Factor Model of Personality , 2002 .

[50]  Anita S. Mak,et al.  Big five personality and cultural relocation factors in Vietnamese Australian students’ intercultural social self-efficacy , 2001 .

[51]  S. Srivastava,et al.  The Big Five Trait taxonomy: History, measurement, and theoretical perspectives. , 1999 .

[52]  O. John,et al.  Los Cinco Grandes across cultures and ethnic groups: multitrait multimethod analyses of the Big Five in Spanish and English. , 1998, Journal of personality and social psychology.

[53]  R. Rosenthal Parametric measures of effect size. , 1994 .

[54]  O. John,et al.  Big Five Inventory , 2012, Encyclopedia of Personality and Individual Differences.

[55]  P. Costa,et al.  Validation of the five-factor model of personality across instruments and observers. , 1987, Journal of personality and social psychology.

[56]  R. Hamilton A Psycholinguistic Analysis of some Interpretive Processes of Three Basic Personality Types , 1957 .

[57]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[58]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .