Automatic Profiling of Twitter Users Based on Their Tweets: Notebook for PAN at CLEF 2015

In this paper we go through our approach at solving the PAN Author Profiling task. We introduce a novel way of computing the type/token ratio of an author and show that, although strong correlations have been observed between high extroversion and low type/token ratios in the past, this ratio is not necessarily a strong indicator of extroversion. Since the text of a person is influenced by all 7 features (gender, age, and big five personality traits) that are required to be automatically identified in this task, we used this ratio, along with Term frequency-Inverse document frequency (tf-idf ) matrices, in all 7 subtasks and all 4 corpora and obtained good results.

[1]  Jennifer Golbeck,et al.  Predicting Personality from Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[2]  Paul T. Costa,et al.  Personality in Adulthood: A Five-Factor Theory Perspective , 2005 .

[3]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[6]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[7]  Alessandro Vinciarelli,et al.  A Survey of Personality Computing , 2014, IEEE Transactions on Affective Computing.

[8]  Benno Stein,et al.  Overview of the 3rd Author Profiling Task at PAN 2015 , 2015, CLEF.

[9]  Nasser Ghasem-Aghaee,et al.  Computational Modeling of Uncertainty Avoidance in Consumer Behavior , 2011 .

[10]  Daniele Quercia,et al.  Our Twitter Profiles, Our Selves: Predicting Personality with Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.