Personality Profiling from Text: Introducing Part-of-Speech N-Grams

A support vector machine is trained to classify the Five Factor personality of writers of free text. Writers are classified for each of the five personality dimensions as high/low with the mean personality score for each dimension used for the dividing point. Writers are also separately classified as high/medium/low with division points at one standard deviation above and below mean. The two-class average accuracy using 5-fold cross validation of 80.6% is much better than the baseline (pick most likely class) accuracy of 50%, but the 3-class accuracy is only slightly better (7.4%) than baseline because most writers fall into the medium class due to the normal distribution of personality values. Features include bag of words, essay length, word sentiment, negation count and part-of-speech n-grams. The consistently positive contribution of POS n-grams (averaging 4.8% and 5.8% for the 2/3 class cases) is analyzed in detail. The information gain for the most predictive features for each of the five personality dimensions are presented and discussed.

[1]  L. A. Pervin Handbook of Personality: Theory and Research , 1992 .

[2]  Jennifer Golbeck,et al.  Predicting Personality from Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[3]  J. Block The Five-Factor Framing of Personality and Beyond: Some Ruminations , 2010 .

[4]  Oliver Brdiczka,et al.  Understanding Email Writers: Personality Prediction from Email Messages , 2013, UMAP.

[5]  Walter Daelemans,et al.  Using syntactic features to predict author personality from text , 2008 .

[6]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[7]  J. Pennebaker,et al.  Linguistic styles: language use as an individual difference. , 1999, Journal of personality and social psychology.

[8]  Shlomo Argamon,et al.  Stylistic text classification using functional lexical features , 2007, J. Assoc. Inf. Sci. Technol..

[9]  Carla E. Brodley,et al.  The Effect of Instance-Space Partition on Significance , 2001, Machine Learning.

[10]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[11]  Robert R. McCrae,et al.  The NEO–PI–3: A More Readable Revised NEO Personality Inventory , 2005, Journal of personality assessment.

[12]  A. Tellegen,et al.  An alternative "description of personality": the big-five factor structure. , 1990, Journal of personality and social psychology.

[13]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[14]  Francesco Ricci,et al.  User Modeling, Adaptation, and Personalization , 2014, Lecture Notes in Computer Science.