Author Profiling in the Wild

In this paper, we use machine learning for profiling authors of online textual media. We are interested in determining the gender and age of an author. We use two different approaches, one where the features are learned from raw data and one where features are manually extracted.We are interested in understanding how well author profiling works in the wild and therefore we have tested our models on different domains than they are trained on. Our results show that applying models to a different domain then they were trained on significantly decreases the performance of the models. The results show that more efforts need to be put into making models domain independent if techniques such as author profiling should be used operationally, for example by training on many different datasets and by using domain independent features.

[1]  Shlomo Argamon,et al.  Mining the Blogosphere: Age, gender and the varieties of self-expression , 2007, First Monday.

[2]  Fredrik Johansson,et al.  Timeprints for identifying social media users with multiple aliases , 2015, Security Informatics.

[3]  Benno Stein,et al.  Overview of the Author Profiling Task at PAN 2013 , 2013, CLEF.

[4]  Lisa Kaati,et al.  Linguistic analysis of lone offender manifestos , 2016, 2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF).

[5]  Arjun Mukherjee,et al.  Improving Gender Classification of Blog Authors , 2010, EMNLP.

[6]  Lisa Kaati,et al.  Detecting Multipliers of Jihadism on Twitter , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[7]  Jeffrey D. Ullman,et al.  Mining of Massive Datasets: Data Mining , 2011 .

[8]  Benno Stein,et al.  Overview of the PAN/CLEF 2015 Evaluation Lab , 2015, CLEF.

[9]  Fredrik Johansson,et al.  Multi-domain Alias Matching Using Machine Learning , 2016, 2016 Third European Network Intelligence Conference (ENIC).

[10]  Marie-Francine Moens,et al.  Age and Gender Identification in Social Media , 2014, CLEF.

[11]  Shlomo Argamon,et al.  Automatically profiling the author of an anonymous text , 2009, CACM.

[12]  Douglas Bagnall,et al.  Author Identification Using Multi-headed Recurrent Neural Networks , 2015, CLEF.

[13]  A. Bartle,et al.  Gender Classification with Deep Learning , 2015 .

[14]  John Horgan,et al.  Bombing Alone: Tracing the Motivations and Antecedent Behaviors of Lone-Actor Terrorists*,†,‡ , 2013, Journal of forensic sciences.

[15]  Shlomo Argamon,et al.  Automatically Categorizing Written Texts by Author Gender , 2002, Lit. Linguistic Comput..