A Straightforward Multimodal Approach for Author Profiling: Notebook for PAN at CLEF 2018

In this paper we evaluate different strategies from the literature for text and image classification at PAN 2018. The main objective of this shared task is the identification of the gender of different users by using tweets and images posted. We evaluate four popular strategies for the text representation: 1) Bag of Terms (BoT), 2) Second Order Attributes (SOA) representation, 3) Convolutional Neural Network (CNN) models and 4) an Ensemble of n-grams at word and character level. For the image representation we used a Convolutional Neural Network (CNN) based on [6]. We observed that the n-grams Ensemble presented the highest performance. For our participation we chose the Ensemble and perform an early fusion with the image representation to create a multimodal representation.

[1]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  C. Walck Hand-book on statistical distributions for experimentalists , 1996 .

[4]  Hugo Jair Escalante,et al.  INAOE's Participation at PAN'13: Author Profiling Task Notebook for PAN at CLEF 2013 , 2013, CLEF.

[5]  Senja Pollak,et al.  PAN 2017: Author Profiling - Gender and Language Variety Prediction , 2017, CLEF.

[6]  Benno Stein,et al.  Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter , 2017, CLEF.

[7]  Teresa Gonçalves,et al.  Age and Gender Identification using Stacking for Classification , 2016, CLEF.

[8]  Marina L. Gavrilova,et al.  Gender Prediction using Individual Perceptual Image Aesthetics , 2016, J. WSCG.

[9]  Benno Stein,et al.  Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter , 2018, CLEF.

[10]  Anastasia Krithara,et al.  Author Profiling using Complementary Second Order Attributes and Stylometric Features , 2016, CLEF.

[11]  Jugal K. Kalita,et al.  Deep Learning applied to NLP , 2017, ArXiv.

[12]  Xiaojun Ma,et al.  Gender estimation for SNS user profiling using automatic image annotation , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[13]  Yoav Goldberg,et al.  Neural Network Methods for Natural Language Processing , 2017, Synthesis Lectures on Human Language Technologies.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  John Cardiff,et al.  Twitter Author Profiling Using Word Embeddings and Logistic Regression , 2017, CLEF.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[18]  Daniel Castro-Castro,et al.  Author Profiling, instance-based Similarity Classification , 2017, CLEF.

[19]  Jennifer L. Bevan,et al.  A picture is worth a thousand words: A content analysis of Facebook profile photographs , 2011, Comput. Hum. Behav..

[20]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[21]  Nils Schaetti UniNE at CLEF 2017: TF-IDF and Deep-Learning for Author Profiling , 2017, CLEF.

[22]  Tomoki Taniguchi,et al.  A Weighted Combination of Text and Image Classifiers for User Gender Inference , 2015, VL@EMNLP.

[23]  Hugo Jair Escalante,et al.  A visual approach for age and gender identification on Twitter , 2018, J. Intell. Fuzzy Syst..

[24]  Paolo Rosso,et al.  Use of Language and Author Profiling : Identification of Gender and Age , 2013 .

[25]  John R. Smith,et al.  You are what you tweet…pic! gender prediction based on semantic analysis of social media images , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[26]  Malvina Nissim,et al.  N-GrAM: New Groningen Author-profiling Model , 2017, CLEF.

[27]  Yassine Benajiba,et al.  Subword-based Deep Averaging Networks for Author Profiling in Social Media , 2017, CLEF.

[28]  Teresa Gonçalves,et al.  Author Profiling Using Support Vector Machines , 2016, CLEF.