Using a Genetic Algorithm Approach to Study the Impact of Imbalanced Corpora in Sentiment Analysis

The SVM classifier has been used in many methods to identify emotions in text due to their good generalization capability and robustness with high dimensionality data. However, most textual corpora usually subject to such methods are naturally imbalanced. As a consequence, the SVM, sensitive to imbalance data, assigns to most texts the majority class. In this article, we present a Genetic Algorithm based approach that aims to reduce the imbalance of the data in the context of emotions identification. This approach allowed us to study the impact of its application in a method of emotion identification in texts written in the Brazilian Portuguese. Experimentations showed us that balancing the corpus could be an alternative when using the SVM classifier for emotions identification, especially in a multiclass configuration.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Diana Inkpen,et al.  Hierarchical Approach to Emotion Recognition and Classification in Texts , 2010, Canadian Conference on AI.

[3]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[4]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[5]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[8]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[9]  Katherine B. Martin,et al.  Facial Action Coding System , 2015 .

[10]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[11]  Hsinchun Chen,et al.  AI and Opinion Mining , 2010, IEEE Intelligent Systems.

[12]  Taghi M. Khoshgoftaar,et al.  Supervised Neural Network Modeling: An Empirical Investigation Into Learning From Imbalanced Data With Labeling Errors , 2010, IEEE Transactions on Neural Networks.

[13]  Diana Inkpen,et al.  Using a Heterogeneous Dataset for Emotion Analysis in Text , 2011, Canadian Conference on AI.

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  P. Wilson,et al.  The Nature of Emotions , 2012 .

[16]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[17]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[18]  Carlo Strapparava,et al.  Learning to identify emotions in text , 2008, SAC '08.

[19]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[20]  Bing Liu Sentiment Analysis , 2020 .

[21]  José Martínez Sotoca,et al.  Improving the Performance of the RBF Neural Networks Trained with Imbalanced Samples , 2007, IWANN.

[22]  Paolo Rosso,et al.  On the Identification of Emotions and Authors' Gender in Facebook Comments on the Basis of their Writing Style , 2013, ESSEM@AI*IA.

[23]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[24]  Abdesselam Bouzerdoum,et al.  A supervised learning approach for imbalanced data sets , 2008, 2008 19th International Conference on Pattern Recognition.

[25]  Guodong Zhou,et al.  Semi-Supervised Learning for Imbalanced Sentiment Classification 1826 , 2011 .

[26]  Mirjana Ivanovi,et al.  TEXT MINING: APPROACHES AND APPLICATIONS 1 , 2008 .

[27]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[28]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..