Exploring Resampling with Neighborhood Bias on Imbalanced Regression Problems

Imbalanced domains are an important problem that arises in predictive tasks causing a loss in the performance of the most relevant cases for the user. This problem has been intensively studied for classification problems. Recently it was recognized that imbalanced domains occur in several other contexts and for a diversity of types of tasks. This paper focus on imbalanced regression tasks. Resampling strategies are among the most successful approaches to imbalanced domains. In this work we propose variants of existing resampling strategies that are able to take into account the information regarding the neighborhood of the examples. Instead of performing sampling uniformly, our proposals bias the strategies for reinforcing some regions of the data sets. In an extensive set of experiments we provide evidence of the advantage of introducing a neighborhood bias in the resampling strategies.

[1]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[2]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[3]  Luís Torgo,et al.  Precision and Recall for Regression , 2009, Discovery Science.

[4]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[5]  Joan Torrens,et al.  On two construction methods of copulas from fuzzy implication functions , 2015, Progress in Artificial Intelligence.

[6]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[7]  Paula Branco Re-sampling Approaches for Regression Tasks under Imbalanced Domains , 2014 .

[8]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[9]  Luís Torgo,et al.  SMOTE for Regression , 2013, EPIA.

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[12]  Luís Torgo An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models in R , 2014, ArXiv.

[13]  Luís Torgo,et al.  Resampling strategies for regression , 2015, Expert Syst. J. Knowl. Eng..

[14]  Luís Torgo,et al.  Utility-Based Regression , 2007, PKDD.