A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets

Although a large number of solutions have been proposed to handle imbalanced classification problems over past decades, many researches pointed out that imbalanced problem does not degrade learning performance by its own but together with other factors. One of these factors is the overlapping problem which plays an even larger role in the classification performance deterioration but is always ignored in previous study. In this paper, we propose a density-based adaptive k nearest neighbor method, namely DBANN, which can handle imbalanced and overlapping problems simultaneously. To do so, a simple but effective distance adjustment strategy is developed to adaptively find the most reliable query neighbors. Concretely, we first partition training data into six parts by density-based method. Next, for each part, we modify distance metric by considering both local and global distribution. Finally, output is made by the query neighbors selected in the new distance metric. Noticeably, the query neighbors of DBANN are adaptively changed according to the degree of imbalance and overlap. To show the validity of our proposed method, experiments are carried out on 16 synthetic datasets and 41 real-world datasets. The results supported by the proper statistical tests show that our proposed method significantly outperforms the state-of-the-art methods.

[1]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[2]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[3]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yaohua Tang,et al.  Improved Classification for Problem Involving Overlapping Patterns , 2007, IEICE Trans. Inf. Syst..

[5]  Xiuzhen Zhang,et al.  A Positive-biased Nearest Neighbour Algorithm for Imbalanced Classification , 2013, PAKDD.

[6]  Eyad Elyan,et al.  Overlap-Based Undersampling for Improving Imbalanced Data Classification , 2018, IDEAL.

[7]  Kesheng Wang,et al.  A deep learning driven method for fault classification and degradation assessment in mechanical equipment , 2019, Comput. Ind..

[8]  Diane J. Cook,et al.  Handling Imbalanced and Overlapping Classes in Smart Environments Prompting Dataset , 2014 .

[9]  J. van Leeuwen,et al.  Intelligent Data Engineering and Automated Learning , 2003, Lecture Notes in Computer Science.

[10]  José Salvador Sánchez,et al.  On the k-NN performance in a challenging scenario of imbalance and overlapping , 2008, Pattern Analysis and Applications.

[11]  Misha Denil,et al.  Overlap versus Imbalance , 2010, Canadian Conference on AI.

[12]  S. Elavaar Kuzhali,et al.  Patch-Based Denoising with K-Nearest Neighbor and SVD for Microarray Images , 2018, CSOS.

[13]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Jian Chu,et al.  A novel SVM modeling approach for highly imbalanced and overlapping classification , 2011, Intell. Data Anal..

[15]  Hans-Peter Kriegel,et al.  DBSCAN Revisited, Revisited , 2017, ACM Trans. Database Syst..

[16]  Taghi M. Khoshgoftaar,et al.  A Novel Noise Filtering Algorithm for Imbalanced Data , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[17]  Stefan Wermter,et al.  Towards Effective Classification of Imbalanced Data with Convolutional Neural Networks , 2016, ANNPR.

[18]  Morena Danieli,et al.  Automatic classification of speech overlaps: Feature representation and algorithms , 2019, Comput. Speech Lang..

[19]  Francisco Herrera,et al.  A proposal for evolutionary fuzzy systems using feature weighting: Dealing with overlapping in imbalanced datasets , 2015, Knowl. Based Syst..

[20]  José Carlos Príncipe,et al.  Nearest Neighbor Distributions for imbalanced classification , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[21]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[22]  Mahsa Shoaran,et al.  Energy-Efficient Classification for Resource-Constrained Biomedical Applications , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[23]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[24]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[25]  Roberto Alejo,et al.  A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios , 2013, Pattern Recognit. Lett..

[26]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[27]  Oscar Camacho Nieto,et al.  The Naïve Associative Classifier (NAC): A novel, simple, transparent, and accurate classification model evaluated on financial data , 2017, Neurocomputing.

[28]  Hongbo Shi,et al.  Kd-Tree Based Efficient Ensemble Classification Algorithm for Imbalanced Learning , 2019, 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI).

[29]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[30]  Sang Uk Lee,et al.  Integrated Position Estimation Using Aerial Image Sequences , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[32]  MengChu Zhou,et al.  A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification , 2017, IEEE Transactions on Cybernetics.

[33]  Zahir Tari,et al.  KRNN: k Rare-class Nearest Neighbour classification , 2017, Pattern Recognit..

[34]  Lionel Prevost,et al.  Artificial Neural Networks in Pattern Recognition , 2016, Lecture Notes in Computer Science.

[35]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[36]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[37]  Francisco Herrera,et al.  Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data , 2016, Knowl. Based Syst..

[38]  Junjie Wu,et al.  Classification with Class Overlapping: A Systematic Study , 2010 .

[39]  María José del Jesús,et al.  Addressing Overlapping in Classification with Imbalanced Datasets: A First Multi-objective Approach for Feature and Instance Selection , 2015, IDEAL.

[40]  Man Zhang,et al.  Relative density-based classification noise detection , 2014 .

[41]  Gustavo E. A. P. A. Batista,et al.  Learning with Class Skews and Small Disjuncts , 2004, SBIA.

[42]  Ying Shen,et al.  Generative adversarial fusion network for class imbalance credit scoring , 2019, Neural Computing and Applications.

[43]  Francisco Herrera,et al.  Dynamic ensemble selection for multi-class imbalanced datasets , 2018, Inf. Sci..

[44]  Eleman Teitei,et al.  Biased Random Forest For Dealing With the Class Imbalance Problem , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[45]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[46]  David E. Goldberg,et al.  Facetwise Analysis of XCS for Problems With Class Imbalances , 2009, IEEE Transactions on Evolutionary Computation.

[47]  Xing Xu,et al.  Sparse representation based image super-resolution on the KNN based dictionaries , 2018 .

[48]  Harshita Patel,et al.  An Improved Fuzzy K-Nearest Neighbor Algorithm for Imbalanced Data using Adaptive Approach , 2019 .

[49]  Salem Chakhar,et al.  Spare parts classification in industrial manufacturing using the dominance-based rough set approach , 2017, Eur. J. Oper. Res..

[50]  Francisco Herrera,et al.  DRCW-ASEG: One-versus-One distance-based relative competence weighting with adaptive synthetic example generation for multi-class imbalanced datasets , 2018, Neurocomputing.

[51]  ShangJennifer,et al.  Learning from class-imbalanced data , 2017 .

[52]  Christopher J. Lowrance,et al.  An active and incremental learning framework for the online prediction of link quality in robot networks , 2019, Eng. Appl. Artif. Intell..

[53]  Leon N. Cooper,et al.  Improving nearest neighbor rule with a simple adaptive distance measure , 2006, Pattern Recognit. Lett..

[54]  Jian Gao,et al.  A new sampling method for classifying imbalanced data based on support vector machine ensemble , 2016, Neurocomputing.

[55]  Sankha Subhra Mullick,et al.  Adaptive Learning-Based $k$ -Nearest Neighbor Classifiers With Resilience to Class Imbalance , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[56]  Iman Nekooeimehr,et al.  Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets , 2016, Expert Syst. Appl..

[57]  Tülin Inkaya,et al.  A density and connectivity based decision rule for pattern classification , 2015, Expert Syst. Appl..

[58]  Fernanda Leite,et al.  6D DBSCAN-based segmentation of building point clouds for planar object classification , 2018 .

[59]  Avijit Podder,et al.  Data on overlapping brain disorders and emerging drug targets in human Dopamine Receptors Interaction Network , 2017, Data in brief.

[60]  José Salvador Sánchez,et al.  An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets , 2007, CIARP.

[61]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[62]  Harshita Patel,et al.  A Hybrid Weighted Nearest Neighbor Approach to Mine Imbalanced Data , 2016 .

[63]  Yuan Yan Tang,et al.  Hybrid Sampling with Bagging for Class Imbalance Learning , 2016, PAKDD.

[64]  Suchi Saria,et al.  Counterfactual Normalization: Proactively Addressing Dataset Shift and Improving Reliability Using Causal Mechanisms , 2018, ArXiv.

[65]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[66]  Chidchanok Lursinsap,et al.  Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms , 2015, Neurocomputing.

[67]  Vikram Pudi,et al.  Class Based Weighted K-Nearest Neighbor over Imbalance Dataset , 2013, PAKDD.

[68]  Xue-wen Chen,et al.  Combating the Small Sample Class Imbalance Problem Using Feature Selection , 2010, IEEE Transactions on Knowledge and Data Engineering.

[69]  Lu Liu,et al.  Classification with ClassOverlapping: A Systematic Study , 2010, ICE-B 2010.

[70]  Peng Peng,et al.  Wear particle classification considering particle overlapping , 2019, Wear.