Using Intra-Profile Information for Author Profiling Notebook for PAN at CLEF 2014

In this paper we describe the participation of the Laboratory of Lan- guage Technologies of INAOE at PAN 2014. We address the Author Profiling (AP) task finding and exploiting relationships among terms, documents, profiles and subprofiles. Our approach uses the idea of second order attributes (a low- dimensional and dense document representation) (4), but goes beyond incorpo- rating information among each target profile. The proposed representation deepen the analysis incorporating information among texts in the same profile, this is, we focus in subprofiles. For this, we automatically find subprofiles and build docu- ment vectors that represent more detailed relationships of documents and subpro- files. We compare the proposed representation with the standard Bag-of-Terms and the best method in PAN13 using the PAN 2014 corpora for AP task. Results show evidence of the usefulness of intra-profile information to determine gender and age profiles. According to the PAN 2014 official results, the proposed method was one of the best three approaches for most social media domains. Particularly, it achieved the best performance in predicting age and gender profiles for blogs and tweets in English.