Using Wide Range of Features for Author profiling

Predicting an author’s age, gender and personality traits by analyzing his/her documents is important in forensics, marketing and resolving authorship disputes. Our system combines different styles, lexicons, topics, familial tokens and different categories of character n-grams as features to build a logistic regression model for four different languages: English, Spanish, Italian and Dutch. With this model, we obtained global ranking scores of 0.6623, 0.6547, 0.7411, 0.7662 for English, Spanish, Italian and Dutch languages respectively.