Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology

Performance metrics (error measures) are vital components of the evaluation frameworks in various fields. The intention of this study was to overview of a variety of performance metrics and approaches to their classification. The main goal of the study was to develop a typology that will help to improve our knowledge and understanding of metrics and facilitate their selection in machine learning regression, forecasting and prognostics. Based on the analysis of the structure of numerous performance metrics, we propose a framework of metrics which includes four (4) categories: primary metrics, extended metrics, composite metrics, and hybrid sets of metrics. The paper identified three (3) key components (dimensions) that determine the structure and properties of primary metrics: method of determining point distance, method of normalization, method of aggregation of point distances over a data set. The paper proposed a new primary metrics typology designed around the key metrics components. The suggested typology has been shown to cover most of the commonly used primary metrics – total of over 40. The main contribution of this paper is in ordering knowledge of performance metrics and enhancing understanding of their structure and properties by proposing a new typology, generic primary metrics mathematic formula and a visualization chart.

[1]  Rob J Hyndman,et al.  25 years of time series forecasting , 2006 .

[2]  J. Garibaldi,et al.  A new accuracy measure based on bounded relative error for time series forecasting , 2017, PloS one.

[3]  Dolores Blanco,et al.  Kullback-Leibler Divergence-Based Differential Evolution Markov Chain Filter for Global Localization of Mobile Robots , 2015, Sensors.

[4]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[5]  F. W. J. Olver,et al.  A New Approach to Error Arithmetic , 1978 .

[6]  Bri-Mathias Hodge,et al.  A suite of metrics for assessing the performance of solar power forecasting , 2015 .

[7]  Madhav V. Marathe,et al.  A framework for evaluating epidemic forecasts , 2017, BMC Infectious Diseases.

[8]  K. Goebel,et al.  Metrics for evaluating performance of prognostic techniques , 2008, 2008 International Conference on Prognostics and Health Management.

[9]  J. Moreno,et al.  Using the R-MAPE index as a resistant measure of forecast accuracy , 2013 .

[10]  Yogesh L. Simmhan,et al.  Holistic Measures for Evaluating Prediction Models in Smart Grids , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11]  Robert Fildes,et al.  Against Your Better Judgment? How Organizations Can Improve Their Use of Management Judgment in Forecasting , 2007, Interfaces.

[12]  Len Tashman,et al.  Percentage Error: What Denominator? , 2009 .

[13]  T. Chai,et al.  Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature , 2014 .

[14]  V. B. Surya Prasath,et al.  Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier - A Review , 2017, Big Data.

[15]  Roberto Rossi,et al.  Mean-based error measures for intermittent demand forecasting , 2013, 1310.5663.

[16]  John D. Mathews,et al.  Arithmetic average, geometric average, and ranking: Application to incoherent scatter radar data processing , 1999 .

[17]  T. Gneiting Making and Evaluating Point Forecasts , 2009, 0912.0902.

[18]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .

[19]  Dirk Müller,et al.  Selecting statistical indices for calibrating building energy models , 2018, Building and Environment.

[20]  S. Kolassa,et al.  Advantages of the MAD/Mean ratio over the MAPE , 2007 .

[21]  Brent H. Meyer,et al.  Trimmed-Mean Inflation Statistics: Just Hit the One in the Middle , 2012 .

[22]  P. Goodwin,et al.  On the asymmetry of the symmetric MAPE , 1999 .

[23]  Alexei Botchkarev Evaluating Hospital Case Cost Prediction Models Using Azure Machine Learning Studio , 2018, ArXiv.

[24]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[25]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[26]  John T. Mentzer,et al.  Forecasting Technique Familiarity, Satisfaction, Usage, and Application , 1995 .

[27]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[28]  Mark A. Tschopp,et al.  Quantifying Similarity and Distance Measures for Vector-Based Datasets: Histograms, Signals, and Probability Distribution Functions , 2017 .

[29]  S. Morley,et al.  Measures of Model Performance Based On the Log Accuracy Ratio , 2018 .

[30]  Jin Li,et al.  Assessing the accuracy of predictive models for numerical data: Not r nor r2, why not? Then what? , 2017, PloS one.

[31]  Vadim Zipunnikov,et al.  Novel metrics for growth model selection , 2018, Emerging Themes in Epidemiology.

[32]  Rob J. Hyndman,et al.  Another Look at Forecast Accuracy Metrics for Intermittent Demand , 2006 .

[33]  Yogesh L. Simmhan,et al.  Improving Energy Use Forecast for Campus Micro-grids Using Indirect Indicators , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[34]  Fred L. Collopy,et al.  Error Measures for Generalizing About Forecasting Methods: Empirical Comparisons , 1992 .

[35]  Evangelos Spiliotis,et al.  Statistical and Machine Learning forecasting methods: Concerns and ways forward , 2018, PloS one.

[36]  Anne-Laure Jousselme,et al.  Distances in evidence theory: Comprehensive survey and generalizations , 2012, Int. J. Approx. Reason..

[37]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[38]  Arthur Bakker,et al.  An Historical Phenomenology of Mean and Median , 2006 .

[39]  D. Thompson,et al.  A History of Greek Mathematics , 1922, Nature.

[40]  Dimitrios D. Thomakos,et al.  Forecasting Multivariate Time Series with the Theta Method , 2015 .

[41]  Padraig Cunningham,et al.  A Taxonomy of Similarity Mechanisms for Case-Based Reasoning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[42]  Alexei Botchkarev,et al.  Evaluating Performance of Regression Machine Learning Models Using Multiple Error Metrics in Azure Machine Learning Studio , 2018 .

[43]  Mihaela Bratu New accuracy measures for point and interval forecasts. A case study for Romania's forecasts of inflation and unemployment rate , 2013 .

[44]  L. Törnqvist,et al.  How Should Relative Changes be Measured , 1985 .

[45]  Yudong Tian,et al.  Performance Metrics, Error Modeling, and Uncertainty Quantification , 2016 .

[46]  Yongil Jeon,et al.  A time-distance criterion for evaluating forecasting models , 2003 .

[47]  B. McCune,et al.  Analysis of Ecological Communities , 2002 .

[48]  Robert Fildes,et al.  The evaluation of extrapolative forecasting methods , 1992 .

[49]  Heeyoung Kim,et al.  A new metric of absolute percentage error for intermittent demand forecasts , 2016 .

[50]  S. Narayanan,et al.  Development of new methods for measuring forecast error , 2016 .

[51]  Teresa M. McCarthy,et al.  The Evolution of Sales Forecasting Management: A 20-year Longitudinal Study of Forecasting Practices , 2006 .

[52]  Chris Tofallis,et al.  A better measure of relative prediction accuracy for model selection and model estimation , 2014, J. Oper. Res. Soc..

[53]  Jim Hoover,et al.  Measuring Forecast Accuracy: Omissions in Today's Forecasting Engines and Demand-Planning Software , 2006 .

[54]  Maria Chatzigiorgaki,et al.  Compressed domain image retrieval: a comparative study of similarity metrics , 2003, Visual Communications and Image Processing.

[55]  David F. Pyke,et al.  Inventory management and production planning and scheduling , 1998 .

[56]  Spyros Makridakis,et al.  Accuracy measures: theoretical and practical concerns☆ , 1993 .

[57]  Shaocai Yu,et al.  New unbiased symmetric metrics for evaluation of air quality models , 2006 .

[58]  J. Boylan,et al.  The accuracy of intermittent demand estimates , 2005 .

[59]  C. Willmott,et al.  Ambiguities inherent in sums-of-squares-based error statistics , 2009 .

[60]  Robert E. Davis,et al.  Statistics for the evaluation and comparison of models , 1985 .

[61]  Brett J. Borghetti,et al.  A Survey of Distance and Similarity Measures Used Within Network Intrusion Anomaly Detection , 2015, IEEE Communications Surveys & Tutorials.

[62]  S. Morley,et al.  Alternatives to accuracy and bias metrics based on percentage errors for radiation belt modeling applications , 2016 .

[63]  C. Granger,et al.  Experience with Forecasting Univariate Time Series and the Combination of Forecasts , 1974 .

[64]  M. Shcherbakov,et al.  A Survey of Forecast Error Measures , 2013 .

[65]  Mark A Tschopp,et al.  Using Similarity Metrics to Quantify Differences in High-Throughput Data Sets: Application to X-ray Diffraction Patterns. , 2017, ACS combinatorial science.