This paper gives a brief description on the methods adopted for the task of author-profiling as part of the competition PAN 2016 [1]. Author profiling is the task of predicting the author’s age and gender from his/her writing. In this paper, we follow a two-level ensemble approach to tackle the cross-genre author profiling task where training documents and testing documents are from different genres. We use the softvoting approach to build the classification ensemble. To include various feature sets, we first train logistic regression models using the extracted word n-gram, character n-gram, and part-of-speech n-gram features for each genre. We then ensemble single-genre predictive models trained on the blog, social media and Twitter data sources, to build our multi-genre ensemble approach. The experimental results indicate that our approach performs well in both single-genre and cross-genre author profiling tasks.
[1]
Marie-Francine Moens,et al.
Computational personality recognition in social media
,
2016,
User Modeling and User-Adapted Interaction.
[2]
Anat Rachel Shimoni,et al.
Gender, genre, and writing style in formal written texts
,
2003
.
[3]
Benno Stein,et al.
Improving the Reproducibility of PAN's Shared Tasks: - Plagiarism Detection, Author Identification, and Author Profiling
,
2014,
CLEF.
[4]
Benno Stein,et al.
Overview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations
,
2016,
CLEF.
[5]
José Carlos González,et al.
DAEDALUS at PAN 2014: Guessing Tweet Author's Gender and Age
,
2014,
CLEF.
[6]
Benno Stein,et al.
Overview of the 3rd Author Profiling Task at PAN 2015
,
2015,
CLEF.