UniNE at CLEF 2016: Author Profiling

This paper describes and evaluates an author profiling model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different Indo-European languages (such as Dutch, English, and Spanish). As features, we suggest using the m most frequent terms of the query text (isolated words and punctuation symbols with m at most 200). Applying a simple distance measure and looking at the five nearest neighbors, we can determine the gender (with the nominal values “male” or “female”) and the age group (with the ordinal measurement 18-24 | 25-34 | 35-49 | 50-64 | >65). While the labeled data is available for Twitter tweets, the evaluations are based on three test collections from an unknown different genre (blogs, reviews, social media, ...) (PAN AUTHOR PROFILING task at CLEF 2016).