Minority oversampling for imbalanced ordinal regression

Abstract Ordinal regression naturally presents class imbalance distribution, because the samples of the boundary classes tend to have lower appearing probability than that of the other classes. As the most common solutions for class imbalance problems, the traditional oversampling algorithms can improve the classification of minority classes, but they result in the problem of over generalization at the same time, where the synthetic samples are created in incorrect regions. In the context of ordinal regression, over generalization can damage the ordering of space distribution of samples, thereby hamper the ordinal regression models to benefit from ordering information. In this paper, we propose a generation direction-aware Synthetic Minority oversampling technique to deal exclusively with imbalanced Ordinal Regression (SMOR). SMOR for each candidate generation direction computes a selection weight of being used to yield synthetic samples. By considering the ordering of the classes, the candidate generation directions, which may potentially distort ordinal sample structure, will tend to be assigned low selection weights. In this way, SMOR improves the ordering of minority classes without severely damaging the existing ordering of the problem. Extensive experiments with three ordinal regression classifiers show that our proposed method outperforms existing typical oversampling algorithms in terms of the Average of Mean Absolute Error ( A M A E )and the Maximum Mean Absolute Error ( M M A E ).

[1]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[2]  María Pérez-Ortiz,et al.  Dynamically weighted evolutionary ordinal neural network for solving an imbalanced liver transplantation problem , 2017, Artif. Intell. Medicine.

[3]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  María Pérez-Ortiz,et al.  Tackling the ordinal and imbalance nature of a melanoma image classification problem , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[5]  Pedro Antonio Gutiérrez,et al.  Exploitation of Pairwise Class Distances for Ordinal Classification , 2013, Neural Computation.

[6]  Luís Torgo,et al.  Resampling strategies for regression , 2015, Expert Syst. J. Knowl. Eng..

[7]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[8]  Xin Yao,et al.  Dynamic Sampling Approach to Training Neural Networks for Multiclass Imbalance Classification , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[10]  Stan Matwin,et al.  Resampling and Cost-Sensitive Methods for Imbalanced Multi-instance Learning , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[11]  Chen Qiu,et al.  A Novel Minority Cloning Technique for Cost-Sensitive Learning , 2015, Int. J. Pattern Recognit. Artif. Intell..

[12]  K. Obermayer,et al.  Learning Preference Relations for Information Retrieval , 1998 .

[13]  Ling Li,et al.  Large-Margin Thresholded Ensembles for Ordinal Regression: Theory and Practice , 2006, ALT.

[14]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  See-Kiong Ng,et al.  Integrated Oversampling for Imbalanced Time Series Classification , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Francisco Herrera,et al.  SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary , 2018, J. Artif. Intell. Res..

[17]  Xi Zhang,et al.  CGMOS: Certainty Guided Minority OverSampling , 2016, CIKM.

[18]  Lu Cao,et al.  An over-sampling method based on probability density estimation for imbalanced datasets classification , 2016, ICIIP '16.

[19]  Ramani Duraiswami,et al.  A Fast Algorithm for Learning a Ranking Function from Large-Scale Data Sets , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Francisco Charte,et al.  MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation , 2015, Knowl. Based Syst..

[21]  Sheng Chen,et al.  Probability density function estimation based over-sampling for imbalanced two-class problems , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[22]  Yaping Lin,et al.  Synthetic minority oversampling technique for multiclass imbalance problems , 2017, Pattern Recognit..

[23]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24]  Pedro Antonio Gutiérrez,et al.  Graph-Based Approaches for Over-Sampling in the Context of Ordinal Regression , 2015, IEEE Transactions on Knowledge and Data Engineering.

[25]  Ling Li,et al.  Reduction from Cost-Sensitive Ordinal Ranking to Weighted Binary Classification , 2012, Neural Computation.

[26]  Najoua Essoukri Ben Amara,et al.  New Oversampling Approaches Based on Polynomial Fitting for Imbalanced Data Sets , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[27]  Pedro Antonio Gutiérrez,et al.  Ordinal Regression Methods: Survey and Experimental Study , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28]  A. Ben-David Monotonicity Maintenance in Information-Theoretic Machine Learning Algorithms , 1995, Machine Learning.

[29]  Seyda Ertekin,et al.  Adaptive Oversampling for Imbalanced Data Classification , 2013, ISCIS.

[30]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[31]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[32]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[33]  Pedro Antonio Gutiérrez,et al.  Metrics to guide a multi-objective evolutionary algorithm for ordinal classification , 2014, Neurocomputing.

[34]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[35]  Wei Liu,et al.  The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift , 2018, Neurocomputing.

[36]  Liangxiao Jiang,et al.  A differential evolution-based method for class-imbalanced cost-sensitive learning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[37]  Pedro Antonio Gutiérrez,et al.  A dynamic over-sampling procedure based on sensitivity for multi-class problems , 2011, Pattern Recognit..

[38]  Francisco Herrera,et al.  SEG-SSC: A Framework Based on Synthetic Examples Generation for Self-Labeled Semi-Supervised Classification , 2015, IEEE Transactions on Cybernetics.

[39]  Kay Chen Tan,et al.  Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning , 2017, IEEE Transactions on Cybernetics.

[40]  Olac Fuentes,et al.  A Distance-Based Over-Sampling Method for Learning from Imbalanced Data Sets , 2007, FLAIRS.

[41]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[42]  Gary R. Weckman,et al.  Using Voronoi diagrams to improve classification performances when modeling imbalanced datasets , 2014, Neural Computing and Applications.

[43]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[44]  Andrea Esuli,et al.  Evaluation Measures for Ordinal Regression , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[45]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[46]  Eyke Hüllermeier,et al.  Is an ordinal class structure useful in classifier learning? , 2008, Int. J. Data Min. Model. Manag..

[47]  Qinghua Zheng,et al.  Ordinal extreme learning machine , 2010, Neurocomputing.

[48]  Francisco Herrera,et al.  SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory , 2012, Knowledge and Information Systems.

[49]  Zhi-Hua Zhou,et al.  ON MULTI‐CLASS COST‐SENSITIVE LEARNING , 2006, Comput. Intell..

[50]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[51]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[52]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.

[53]  Giuseppe Argenziano,et al.  Dermoscopic criteria for melanoma in situ are similar to those for early invasive melanoma , 2001, Cancer.

[54]  Shasha Wang,et al.  Cost-sensitive Bayesian network classifiers , 2014, Pattern Recognit. Lett..

[55]  Sattar Hashemi,et al.  To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[56]  Francisco Herrera,et al.  A survey on data preprocessing for data stream mining: Current status and future directions , 2017, Neurocomputing.

[57]  Chumphol Bunkhumpornpat,et al.  DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique , 2011, Applied Intelligence.

[58]  Diane J. Cook,et al.  RACOG and wRACOG: Two Probabilistic Oversampling Techniques , 2015, IEEE Transactions on Knowledge and Data Engineering.

[59]  Francisco Herrera,et al.  Chain based sampling for monotonic imbalanced classification , 2019, Inf. Sci..

[60]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[61]  Iman Nekooeimehr,et al.  Cluster-based Weighted Oversampling for Ordinal Regression (CWOS-Ord) , 2016, Neurocomputing.

[62]  Haibo He,et al.  RAMOBoost: Ranked Minority Oversampling in Boosting , 2010, IEEE Transactions on Neural Networks.

[63]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[64]  Hamido Fujita,et al.  Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates , 2018, Inf. Sci..

[65]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..