Author Profiling Using Corpus Statistics, Lexicons and Stylistic Features Notebook for PAN at CLEF-2013

This paper describes our participation in the 9th PAN eval- uation lab in the author proling task. The proposed approach relies on the extraction of stylistic, lexicon and corpus-based features, which were combined with a logistic classier. These three sets of features contain pairwise intersections and even some features that belong to all cate- gories. A comprehensive comparison of the contribution of several feature subsets is presented. In particular, a set of features based on Bayesian inference provided the most important contribution. We developed our system in the Spanish training corpus, once developed it was used, with minor changes, for the English documents, too. The proposed system was ranked 6th in the ocial ranking for Spanish documents among 17 submitted systems. This result shows that our approach is meaningful and competitive for predicting demographics from text.