A Multimodal Author Profiling System for Tweets

The rising usage of social media has motivated to invent different methodologies of anonymous writing, which leads to an increase in malicious and suspicious activities. This anonymity has created difficulty in finding the suspect. Author profiling deals with the characterization of an author through some key attributes such as gender, age, language, dialect region variety, personality, and so on. Identifying the gender of the author of a suspect document is a salient task of author-profiling. The linguistic profile of a user can help in determining his/her demographics. Different social media platforms, such as Twitter, Facebook, and Instagram, are used regularly by users for sharing their daily life activities. Moreover, users often post images along with text on different social media platforms; thus, the usage of multimodal information is very common nowadays. In this article, the task of automatic gender prediction from multimodal Twitter data is posed as a classification problem and an efficient multimodal neural framework is proposed for solving this. The popularly used BERT_base is utilized for learning the encoded representation for the text part of the tweet, and recently introduced EfficientNet is used for extracting the features from images. Finally, a direct product-based fusion strategy is applied for fusing the text and image representations, followed by a fully connected layer for predicting the gender of a Twitter user. Plagiarism detection authorship analysis near end duplicate detection (PAN)-2018 author profiling data are used for evaluating the performance of our proposed approach. Our proposed model achieved accuracies of 82.05%, 86.22%, and 89.53% for pure-image, pure-text, and multimodal setting, respectively; outperforming the previous state-of-the-art works in all the cases. Moreover, a deep analysis is carried out to interpret the produced results; different words that serve as clues for gender classification are identified characterizing different gender classes. The supplementary file and the source codes for the proposed approach are available at https://github.com/chanchalIITP/GenderTCSS.