Predicting Personality Traits from Spontaneous Modern Greek Text: Overcoming the Barriers

The present work aims at identifying relations between the morphosyntactic and semantic properties of an author’s writings and his/her personality traits. Machine learning schemata are used to classify an author according to the values of the Big Five traits, or predict their numerical value. Unlike related work, the current approach focuses on Modern Greek text, and makes use of limited data and resources, available at its disposal. Meta-learning and synthetic oversampling help overcome the small dataset and its imbalanced class distribution.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  C. Peterson,et al.  Can Explanatory Style be Scored from TAT Protocols? , 1994 .

[3]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[4]  Paolo Rosso,et al.  A comparative evaluation of personality estimation algorithms for the twin recommender system , 2011, SMUC '11.

[5]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[6]  Shlomo Argamon,et al.  Automatically profiling the author of an anonymous text , 2009, CACM.

[7]  J. Pennebaker,et al.  Linguistic styles: language use as an individual difference. , 1999, Journal of personality and social psychology.

[8]  R. Cartwright,et al.  The Measurement of Psychological States Through the Content Analysis of Verbal Behavior , 1971 .

[9]  W. Weintraub Verbal behavior in everyday life , 1989 .

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  William B. Stiles,et al.  Describing talk : a taxonomy of verbal response modes , 1992 .

[12]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[13]  J. Stern,et al.  Experimental Investigation of the Specificity of Attitude Hypothesis in Psychosomatic Disease , 1958, Psychosomatic medicine.

[14]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[15]  Tal Yarkoni Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. , 2010, Journal of research in personality.

[16]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[17]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[18]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[19]  S. Srivastava,et al.  The Big Five Trait taxonomy: History, measurement, and theoretical perspectives. , 1999 .

[20]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[21]  Robert K. Lindsay Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .