IMPACTO DEL DESEQUILIBRIO DE CLASES EN EL ENTRENAMIENTO DE REDES NEURONALES CONVOLUCIONALES EN PROBLEMAS MULTI-CLASE (IMPACT OF CLASS IMBALANCE IN THE TRAINING OF CONVOLUTIONAL NEURAL NETWORKS FOR MULTI-CLASS PROBLEMS)

El problema del desequilibrio de clases en el aprendizaje automatico, se presenta cuando el conjunto de entrenamiento subyacente esta compuesto por un numero desigual de muestras para cada clase, lo que ocasiona que datos de algunas clases dominen claramente. Aparentemente, la mayoria de los modelos clasificadores aprenden a clasificar dichos conjuntos de datos; sin embargo, presentan un rendimiento de generalizacion deficiente debido a un fuerte sesgo hacia las clases mayoritarias. En este articulo, se presenta un estudio sistematico dirigido a comprender como afecta el problema del desequilibrio de clases al rendimiento de una red neuronal convolucional entrenada para una tarea de clasificacion de imagenes, y se presenta una metodologia para corregir el sobreentrenamiento e incrementar la generalizacion de la red. The class imbalance problem in machine learning occurs when the underlying training set is composed of unequal number of samples for each class, which causes data from some classes to clearly dominate. Apparently, most classifiers learn to classify such datasets, however, they show poor generalization performance due to a strong bias towards the majority classes. This article presents a systematic study aimed at understanding how the class imbalance problem affects the performance of a convolutional neural network which has been trained for an image classification task. Also, we present a methodology to correct the overtraining and increase the generalization performance of the network.

[1]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Mingli Song,et al.  Towards Deeper Insights into Deep Learning from Imbalanced Data , 2017, CCCV.

[3]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Shin Ando,et al.  Deep Over-sampling Framework for Classifying Imbalanced Data , 2017, ECML/PKDD.

[5]  Shaogang Gong,et al.  Class Rectification Hard Mining for Imbalanced Deep Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[7]  Francisco Charte,et al.  On the Impact of Imbalanced Data in Convolutional Neural Networks Performance , 2017, HAIS.

[8]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Shaogang Gong,et al.  Imbalanced Deep Learning by Minority Class Incremental Rectification , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Longbing Cao,et al.  Training deep neural networks on imbalanced data sets , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[12]  Andrew K. C. Wong,et al.  A Weight-Selection Strategy on Training Deep Neural Networks for Imbalanced Classification , 2017, ICIAR.

[13]  David Masko,et al.  The Impact of Imbalanced Training Data for Convolutional Neural Networks , 2015 .

[14]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..