Overview of Machine Learning Process Modelling

Much research has been conducted in the area of machine learning algorithms; however, the question of a general description of an artificial learner’s (empirical) performance has mainly remained unanswered. A general, restrictions-free theory on its performance has not been developed yet. In this study, we investigate which function most appropriately describes learning curves produced by several machine learning algorithms, and how well these curves can predict the future performance of an algorithm. Decision trees, neural networks, Naïve Bayes, and Support Vector Machines were applied to 130 datasets from publicly available repositories. Three different functions (power, logarithmic, and exponential) were fit to the measured outputs. Using rigorous statistical methods and two measures for the goodness-of-fit, the power law model proved to be the most appropriate model for describing the learning curve produced by the algorithms in terms of goodness-of-fit and prediction capabilities. The presented study, first of its kind in scale and rigour, provides results (and methods) that can be used to assess the performance of novel or existing artificial learners and forecast their ‘capacity to learn’ based on the amount of available or desired data.

[1]  Scott D. Brown,et al.  The power law repealed: The case for an exponential law of practice , 2000, Psychonomic bulletin & review.

[2]  Antonio G. Chessa,et al.  Power laws from individual differences in learning and forgetting: mathematical analyses , 2011, Psychonomic bulletin & review.

[3]  H. Abdi The Bonferonni and Šidák Corrections for Multiple Comparisons , 2006 .

[4]  S. S. Gill,et al.  Predicting the growth and trend of COVID-19 pandemic using machine learning and cloud computing☆ , 2020, Internet of Things.

[5]  H. Theil,et al.  Economic Forecasts and Policy. , 1959 .

[6]  Marco Loog,et al.  The Shape of Learning Curves: a Review , 2021, ArXiv.

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  Marc Dymetman,et al.  Prediction of Learning Curves in Machine Translation , 2012, ACL.

[9]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[10]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[11]  Ivan Rozman,et al.  Comparisons between Three Cross-Validation Methods for Measuring Learners' Performances , 2014, EJC.

[12]  Jörg Rech,et al.  Knowledge Discovery in Databases , 2001, Künstliche Intell..

[13]  Sameer Singh Modeling Performance of Different Classification Methods : Deviation from the Power Law , 2005 .

[14]  John R. Anderson,et al.  Reflections of the Environment in Memory Form of the Memory Functions , 2022 .

[15]  Gintautas Dzemyda,et al.  Large-Scale Data Analysis Using Heuristic Methods , 2011, Informatica.

[16]  J. Peltokorpi,et al.  Adjustment for cognitive interference enhances the predictability of the power learning curve , 2021 .

[17]  Huan Liu,et al.  Modelling Classification Performance for Large Data Sets , 2001, WAIM.

[18]  Gintautas Dzemyda,et al.  The Concept of AI-Based Algorithm: Analysis of CEUS Images and HSPs for Identification of Early Parenchymal Changes in Severe Acute Pancreatitis , 2021, Informatica.

[19]  Taeyong Yang,et al.  The effect of switching renewable energy support systems on grid parity for photovoltaics: Analysis using a learning curve model , 2020 .

[20]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[21]  Yang Yang,et al.  Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.

[22]  Douglas H. Fisher,et al.  Modeling decision tree performance with the power law , 1999, AISTATS.

[23]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[24]  Aaron N. Richter,et al.  Learning Curve Estimation with Large Imbalanced Datasets , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).

[25]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[26]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[27]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[28]  Lan-Yan Yang,et al.  Learning curve analysis of applying Seprafilm hyaluronic acid/carboxymethylcellulose membrane during laparoscopic hysterectomy , 2020, Scientific Reports.

[29]  Derek Hoiem,et al.  Learning Curves for Analysis of Deep Networks , 2020, ICML.

[30]  Jun Zhou,et al.  Imbalanced Learning Based on Data-Partition and SMOTE , 2018, Inf..

[31]  Marko Hölbl,et al.  Best-Fit Learning Curve Model for the C4.5 Algorithm , 2014, Informatica.

[32]  Paulius Vaitkevicius,et al.  Comparison of Classification Algorithms for Detection of Phishing Websites , 2020, Informatica.

[33]  S. Glantz Primer of applied regression and analysis of variance / Stanton A. Glantz, Bryan K. Slinker , 1990 .

[34]  Richard B. Anderson The power law as an emergent property , 2001, Memory & cognition.

[35]  Farshad Fotouhi,et al.  Data Modeling for Content-Based Support Environment (C-BASE): Application on Epilepsy Data Mining , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[36]  Steven Euijong Whang,et al.  Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models , 2020, SIGMOD Conference.

[37]  Tim Oates,et al.  Efficient progressive sampling , 1999, KDD '99.

[38]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.