A Hierarchical Attention Network for Bots and Gender Profiling

Author profiling represents the task of detecting various author aspects, for instance age, gender or personality, by analyzing written text. The bot identification issue is particularly important in today’s society given the increase in social media usage and the effect of opinion influencing bots on the public. This paper describes our solution for the Bots and Gender Profiling problem, introduced at PAN 2019. The PAN challenge is a two part multilingual problem, namely for the English and Spanish languages. The first task has the goal of identifying if the author is a human or a bot. For the second task, the system has to detect the gender of human authors. Our solution uses a deep learning model based on Hierarchical Attention Networks (HAN) as well as pretrained word embeddings for text representation. For the first task, the official results show that the model achieves an accuracy score of 0.8943 for English and 0.8483 for Spanish. For the second task, our model obtains 0.7485 accuracy for English and 0.6711 for Spanish.

[1]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[3]  Wesley De Neve,et al.  Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations , 2015, NUT@IJCNLP.

[4]  Stefan Trausan-Matu,et al.  SC-UPB at the VarDial 2019 Evaluation Campaign: Moldavian vs. Romanian Cross-Dialect Topic Identification , 2019, Proceedings of the Sixth Workshop on.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Benno Stein,et al.  Overview of the 2 nd Author Profiling Task at PAN 2014 , 2014 .

[7]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[8]  Arjun Mukherjee,et al.  A Parallel Hierarchical Attention Network for Style Change Detection: Notebook for PAN at CLEF 2018 , 2018, CLEF.

[9]  Benno Stein,et al.  Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter , 2018, CLEF.

[10]  Fan Yang,et al.  Satirical News Detection and Analysis using Attention Mechanism and Linguistic Features , 2017, EMNLP.

[11]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[12]  Paolo Rosso,et al.  A Low Dimensionality Representation for Language Variety Identification , 2016, CICLing.

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Emilio Ferrara,et al.  Social Bots Distort the 2016 US Presidential Election Online Discussion , 2016, First Monday.

[15]  Paolo Rosso,et al.  Overview of the 7th Author Profiling Task at PAN 2019: Bots and Gender Profiling in Twitter , 2019, CLEF.