Imbalanced regression and extreme value prediction

Research in imbalanced domain learning has almost exclusively focused on solving classification tasks for accurate prediction of cases labelled with a rare class. Approaches for addressing such problems in regression tasks are still scarce due to two main factors. First, standard regression tasks assume each domain value as equally important. Second, standard evaluation metrics focus on assessing the performance of models on the most common values of data distributions. In this paper, we present an approach to tackle imbalanced regression tasks where the objective is to predict extreme (rare) values. We propose an approach to formalise such tasks and to optimise/evaluate predictive models, overcoming the factors mentioned and issues in related work. We present an automatic and non-parametric method to obtain relevance functions, building on the concept of relevance as the mapping of target values into non-uniform domain preferences. Then, we propose SERA, a new evaluation metric capable of assessing the effectiveness and of optimising models towards the prediction of extreme values while penalising severe model bias. An experimental study demonstrates how SERA provides valid and useful insights into the performance of models in imbalanced regression tasks.

[1]  F. Diebold,et al.  Further results on forecasting and model selection under asymmetric loss , 1996 .

[2]  Sven F. Crone,et al.  Utility based data mining for time series analysis: cost-sensitive learning for neural network predictors , 2005, UBDM '05.

[3]  Irena Koprinska,et al.  Yearly and seasonal models for electricity load forecasting , 2011, The 2011 International Joint Conference on Neural Networks.

[4]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[5]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[6]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[7]  Marco Zaffalon,et al.  Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis , 2016, J. Mach. Learn. Res..

[8]  Torrin M. Liddell,et al.  The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective , 2016, Psychonomic bulletin & review.

[9]  Rand R. Wilcox,et al.  Comparing the means of two independent groups , 2007 .

[10]  A. Edelman,et al.  Nonnegativity-, monotonicity-, or convexity-preserving cubic and quintic Hermite interpolation , 1989 .

[11]  Alexandre Termier,et al.  Anomaly Detection in Streams with Extreme Value Theory , 2017, KDD.

[12]  Edzer Pebesma,et al.  spacetime: Spatio-Temporal Data in R , 2012 .

[13]  M. Hubert,et al.  A Robust Measure of Skewness , 2004 .

[14]  G. Phillips Interpolation and Approximation by Polynomials , 2003 .

[15]  João Mendes-Moreira,et al.  autoBagging: Learning to Rank Bagging Workflows with Metalearning , 2017, AutoML@PKDD/ECML.

[16]  Luís Torgo,et al.  Utility-Based Regression , 2007, PKDD.

[17]  Tina Eliassi-Rad,et al.  L2P: An Algorithm for Estimating Heavy-tailed Outcomes , 2019, ArXiv.

[18]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[19]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[20]  Luís Torgo,et al.  Regression error characteristic surfaces , 2005, KDD '05.

[21]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[22]  Clive W. J. Granger,et al.  Outline of forecast theory using generalized cost functions , 1999 .

[23]  Ritwik Sinha,et al.  Evaluation of Interpolants in Their Ability to Fit Seismometric Time Series , 2015 .

[24]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[25]  Ingrid Hobæk Haff,et al.  Generalised additive modelling of air pollution, traffic volume and meteorology , 2005 .

[26]  José Hernández-Orallo,et al.  ROC curves for regression , 2013, Pattern Recognit..

[27]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[28]  Oguz Akbilgic,et al.  A novel Hybrid RBF Neural Networks model as a forecaster , 2013, Statistics and Computing.

[29]  Magne Aldrin,et al.  Improved predictions penalizing both slope and curvature in additive models , 2006, Comput. Stat. Data Anal..

[30]  William M. Shyu,et al.  Local Regression Models , 2017 .

[31]  Luís Torgo,et al.  Pre-processing approaches for imbalanced distributions in regression , 2019, Neurocomputing.

[32]  J. Kruschke Doing Bayesian Data Analysis , 2010 .

[33]  A. Zellner Bayesian Estimation and Prediction Using Asymmetric Loss Functions , 1986 .

[34]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[35]  Douglas G Altman,et al.  Dichotomizing continuous predictors in multiple regression: a bad idea , 2006, Statistics in medicine.

[36]  Kaiyong Zhao,et al.  AutoML: A Survey of the State-of-the-Art , 2019, Knowl. Based Syst..

[37]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[38]  Nitesh V. Chawla,et al.  SMOTEBoost for Regression: Improving the Prediction of Extreme Values , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[39]  George Wright,et al.  The limits of forecasting methods in anticipating rare events , 2010 .

[40]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[41]  Mia Hubert,et al.  An adjusted boxplot for skewed distributions , 2008, Comput. Stat. Data Anal..

[42]  Marco Zaffalon,et al.  A Bayesian Wilcoxon signed-rank test based on the Dirichlet process , 2014, ICML.

[43]  Jinbo Bi,et al.  Regression Error Characteristic Curves , 2003, ICML.

[44]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[45]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[46]  Xiangnan He,et al.  Modeling Extreme Events in Time Series Prediction , 2019, KDD.

[47]  Lisa Stryjewski,et al.  40 years of boxplots , 2010 .

[48]  Trevor J. McDougall,et al.  Two Interpolation Methods Using Multiply-Rotated Piecewise Cubic Hermite Interpolating Polynomials , 2020 .

[49]  B. Ripley,et al.  Recursive Partitioning and Regression Trees , 2015 .

[50]  Joaquín Izquierdo,et al.  Predictive models for forecasting hourly urban water demand , 2010 .

[51]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[52]  M. Akritas,et al.  NonpModelCheck: An R Package for Nonparametric Lack-of-Fit Testing and Variable Selection , 2017 .

[53]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[54]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[55]  Frederick Mosteller,et al.  Understanding robust and exploratory data analysis , 1983 .

[56]  F. Diebold,et al.  Optimal Prediction Under Asymmetric Loss , 1994, Econometric Theory.

[57]  Luís Torgo,et al.  Resampling strategies for imbalanced time series forecasting , 2017, International Journal of Data Science and Analytics.

[58]  R. E. Carlson,et al.  Monotone Piecewise Cubic Interpolation , 1980 .

[59]  John K. Kruschke,et al.  The Bayesian New Statistics: Two historical trends converge , 2015 .

[60]  Yu Zheng,et al.  U-Air: when urban air quality inference meets big data , 2013, KDD.

[61]  A. Hald A history of mathematical statistics from 1750 to 1930 , 1998 .

[62]  Luís Torgo,et al.  A Framework for Recommendation of Highly Popular News Lacking Social Feedback , 2017, New Generation Computing.

[63]  Luís Torgo,et al.  Resampling strategies for regression , 2015, Expert Syst. J. Knowl. Eng..

[64]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[65]  Christian Janssen,et al.  Real estate price prediction under asymmetric loss , 1995 .

[66]  Christophe G. Giraud-Carrier,et al.  The data mining advisor: meta-learning at the service of practitioners , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).