CIMAT_2021 at PoliticEs 2022: Ensemble Based Classification Algorithms for Author Profiling in Spanish Language

Author profiling is a very important and useful task in the Natural Language Processing research community. Its objective is to infer some characteristics related to the author of some text, such as gender, age, and preferences, among others. In this paper, we present our solution to the Spanish Author Profiling for Political Ideology task in PoliticEs@IberLEF2022. This solution consists on specialized classification models for each subtask, specifically, we used fine-tuned BERT models for the gender and profession subtasks, XGBoost for binary ideology, and Logistic Regression for multiclass ideology. A variety of pre-processing techniques were also used to clean up the texts. With our final approach we obtained the 4th place in the PoliticEs contest.

[1]  Jorge P'erez,et al.  Spanish Pre-trained BERT Model and Evaluation Data , 2023, ArXiv.

[2]  Ricardo Colomo Palacios,et al.  Psychographic traits identification based on political ideology: An author analysis study on Spanish politicians' tweets posted in 2020 , 2021, Future Gener. Comput. Syst..

[3]  Juan Manuel Pérez,et al.  RoBERTuito: a pre-trained language model for social media text in Spanish , 2021, LREC.

[4]  Ivandré Paraboni,et al.  User profiling and satisfaction inference in public information access services , 2021, Journal of Intelligent Information Systems.

[5]  Luis Villaseñor Pineda,et al.  Author Profiling in Social Media with Multimodal Information , 2020, Computación y Sistemas.

[6]  Liyana Shuib,et al.  A Survey of User Profiling: State-of-the-Art, Challenges, and Solutions , 2019, IEEE Access.

[7]  Abdelmajid Ben Hamadou,et al.  Arabic Twitter User Profiling: Application to Cyber-security , 2019, WEBIST.

[8]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[9]  Arash Habibi Lashkari,et al.  A Survey on User Profiling Model for Anomaly Detection in Cyberspace , 2018, Journal of Cyber Security and Mobility.

[10]  Xiangliang Zhang,et al.  Dynamic Embeddings for User Profiling in Twitter , 2018, KDD.

[11]  Sergey I. Nikolenko,et al.  Exploring convolutional neural networks and topic models for user profiling from drug reviews , 2017, Multimedia Tools and Applications.

[12]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[13]  Debajyoti Mukhopadhyay,et al.  User Profiling Trends, Techniques and Applications , 2015, ArXiv.

[14]  Ayse Cufoglu,et al.  User Profiling - A Short Review , 2014 .

[15]  Bruce Krulwich,et al.  LIFESTYLE FINDER: Intelligent User Profiling Using Large-Scale Demographic Data , 1997, AI Mag..

[16]  R. Valencia-García,et al.  Overview of PoliticEs 2022: Spanish Author Profiling for Political Ideology , 2022, Proces. del Leng. Natural.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Noria Taghezout,et al.  An Adapted Approach for User Profiling in a Recommendation System: Application to Industrial Diagnosis , 2018, Int. J. Interact. Multim. Artif. Intell..

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.