INAOE's Participation at PAN'13: Author Profiling Task Notebook for PAN at CLEF 2013

This paper describes the participation of the Laboratory of Language Technologies of INAOE at PAN 2013 evaluation lab. We adopted second order representations for facing the problem of Author Profiling (AP). This represen- tation tackles two shortcomings of the typical Bag-of-Terms: i) the sparsity and high dimensionality of document representations, and ii) the assumption of total independence between terms in documents. In order to overcome these problems the proposed representation builds document vectors in a space of the different profiles, which represent the relationships of each document with the different profiles (say, age and gender). In order to evaluate our approach, we compare the proposed representation against a standard Bag-of-Terms representation using the PAN 2013 corpus for AP. We found that the second order attributes using a low computational cost, show evidence of being useful to determine genre and age profile.

[1]  Moshe Koppel,et al.  Determining an author's native language by mining a text for errors , 2005, KDD '05.

[2]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3]  Benno Stein,et al.  Overview of the 2 nd Author Profiling Task at PAN 2014 , 2014 .

[4]  Shlomo Argamon,et al.  Automatically Categorizing Written Texts by Author Gender , 2002, Lit. Linguistic Comput..

[5]  Hans van Halteren,et al.  Linguistic Profiling for Authorship Recognition and Verification , 2004, ACL.

[6]  Peter Wiemer-Hastings,et al.  Latent semantic analysis , 2004, Annu. Rev. Inf. Sci. Technol..

[7]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[8]  Hugo Jair Escalante,et al.  INAOE's Participation at PAN'15: Author Profiling task , 2015, CLEF.

[9]  Zhongyang Xiong,et al.  Fast text categorization using concise semantic analysis , 2011, Pattern Recognit. Lett..

[10]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[11]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[12]  Benno Stein,et al.  Overview of the Author Profiling Task at PAN 2013 , 2013, CLEF.

[13]  Gang Wang,et al.  Building text features for object image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[15]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.