In this paper we describe the participation of the Laboratory of Lan- guage Technologies of INAOE at PAN 2014. We address the Author Profiling (AP) task finding and exploiting relationships among terms, documents, profiles and subprofiles. Our approach uses the idea of second order attributes (a low- dimensional and dense document representation) (4), but goes beyond incorpo- rating information among each target profile. The proposed representation deepen the analysis incorporating information among texts in the same profile, this is, we focus in subprofiles. For this, we automatically find subprofiles and build docu- ment vectors that represent more detailed relationships of documents and subpro- files. We compare the proposed representation with the standard Bag-of-Terms and the best method in PAN13 using the PAN 2014 corpora for AP task. Results show evidence of the usefulness of intra-profile information to determine gender and age profiles. According to the PAN 2014 official results, the proposed method was one of the best three approaches for most social media domains. Particularly, it achieved the best performance in predicting age and gender profiles for blogs and tweets in English.
[1]
Chih-Jen Lin,et al.
LIBLINEAR: A Library for Large Linear Classification
,
2008,
J. Mach. Learn. Res..
[2]
Shlomo Argamon,et al.
Automatically profiling the author of an anonymous text
,
2009,
CACM.
[3]
Benno Stein,et al.
Overview of the Author Profiling Task at PAN 2013
,
2013,
CLEF.
[4]
Shlomo Argamon,et al.
Automatically Categorizing Written Texts by Author Gender
,
2002,
Lit. Linguistic Comput..
[5]
Shlomo Argamon,et al.
Effects of Age and Gender on Blogging
,
2006,
AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.
[6]
Hugo Jair Escalante,et al.
INAOE's Participation at PAN'13: Author Profiling Task Notebook for PAN at CLEF 2013
,
2013,
CLEF.