Two Methodologies Applied to the Author Profiling Task
暂无分享,去创建一个
This paper describes two methodologies applied to the author profiling task submitted to the PAN 2013 competition of the CLEF 2013 conference. The first methodology was applied only to the English language, whereas the second one was executed only over the corpus written in Spanish language. The aim was to evaluate the performance of both methodologies in the above mentioned task. The obtained results were quite positive for the first methodology which considers a classicaly approach of classification, using diverse features extracted from the texts in order to feed a classifier based on random forests. The second methodology, based on graph mining techniques, obtained a very poor performance for the author profiling task. 1 Description of the Methodologies Evaluated We applied two different methodologies, one for each language. For the English corpus, we employed machine learning techniques with different sets of features. The description of this first methodology is presented in Section 1.1. The Spanish corpus was processed with a second methodology based on graph mining techniques. This methodology is described in Section 1.2.
[1] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.