Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem

Class imbalance is a situation where instances in one class much higher than instances in other classes. In clustering, this problem not only affects the accuracy of a prediction but also introduces bias in decision-making process. In this case, a machine learning technique will yield a good prediction accuracy from training data class with a large number of instances, but give a poor accuracy in classes with the small number of instances. In this research, we propose an approach for optimizing K-Means clustering in handling class imbalance problem. The approach uses the perceptron feed-forward neural network to determine coordinates of the centroid of a cluster in K-Means clustering processes. Data used in this research are datasets from the UCI Machine Learning Repository. From the experimental results obtained, the proposed approach could optimize the result of K-Means clustering in terms of minimizing class imbalance.

[1]  K. Nageswara Rao,et al.  Undersampled $$K$$K-means approach for handling imbalanced distributed data , 2014, Progress in Artificial Intelligence.

[2]  C. Lee Giles,et al.  Active learning for class imbalance problem , 2007, SIGIR.

[3]  Haiqiao Huang,et al.  A robust adaptive clustering analysis method for automatic identification of clusters , 2012, Pattern Recognit..

[4]  Opim Salim Sitompul,et al.  Distributed autonomous Neuro-Gen Learning Engine for content-based document file type identification , 2014, 2014 International Conference on Cyber and IT Service Management (CITSM).

[5]  Roberto Alejo,et al.  A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios , 2013, Pattern Recognit. Lett..

[6]  Md Zahidul Islam,et al.  CRUDAW: A Novel Fuzzy Technique for Clustering Records Following User Defined Attribute Weights , 2012, AusDM.

[7]  Junjie Wu,et al.  The Uniform Effect of K-means Clustering , 2012 .

[8]  Martin A. Riedmiller,et al.  Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms , 1994 .

[9]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  David Calvert,et al.  Distributed artificial neural network architectures , 2005, 19th International Symposium on High Performance Computing Systems and Applications (HPCS'05).

[11]  A. Govardhan,et al.  Visual K-Means Approach for Handling Class Imbalance Learning , 2016 .

[12]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[13]  K. Thangavel,et al.  Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure , 2009 .

[14]  Kishan G. Mehrotra,et al.  An improved algorithm for neural network classification of imbalanced training sets , 1993, IEEE Trans. Neural Networks.

[15]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[16]  Longbing Cao,et al.  Training deep neural networks on imbalanced data sets , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).