Making Early Predictions of the Accuracy of Machine Learning Applications

The accuracy of machine learning systems is a widely studied research topic. Established techniques such as cross-validation predict the accuracy on unseen data of the classifier produced by applying a given learning method to a given training data set. However, they do not predict whether incurring the cost of obtaining more data and undergoing further training will lead to higher accuracy. In this paper we investigate techniques for making such early predictions. We note that when a machine learning algorithm is presented with a training set the classifier produced, and hence its error, will depend on the characteristics of the algorithm, on training set’s size, and also on its specific composition. In particular we hypothesise that if a number of classifiers are produced, and their observed error is decomposed into bias and variance terms, then although these components may behave differently, their behaviour may be predictable. We test our hypothesis by building models that, given a measurement taken from the classifier created from a limited number of samples, predict the values that would be measured from the classifier produced when the full data set is presented. We create separate models for bias, variance and total error. Our models are built from the results of applying ten

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Lawrence D. Jackel,et al.  Learning Curves: Asymptotic Values and Rate of Convergence , 1993, NIPS.

[4]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[5]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[6]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[7]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[8]  Sayan Mukherjee,et al.  Estimating Dataset Size Requirements for Classifying DNA Microarray Data , 2003, J. Comput. Biol..

[9]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[10]  C. Sitthi-amorn,et al.  Bias , 1993, The Lancet.

[11]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[12]  Geoffrey I. Webb,et al.  Estimating bias and variance from data , 2003 .

[13]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[14]  Juan José Rodríguez Diez,et al.  Bias and Variance of Rotation-Based Ensembles , 2005, IWANN.

[15]  David G. Stork,et al.  Pattern Classification , 1973 .

[16]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[17]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[18]  Edwin Lughofer,et al.  Extensions of vector quantization for incremental clustering , 2008, Pattern Recognit..

[19]  L. Ryd,et al.  On bias. , 1994, Acta orthopaedica Scandinavica.

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  Muhammad Atif Tahir,et al.  Stop Wasting Time: On Predicting the Success or Failure of Learning for Industrial Applications , 2007, IDEAL.

[22]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[23]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[24]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[25]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[26]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[27]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[28]  Geoffrey I. Webb,et al.  On the effect of data set size on bias and variance in classification learning , 1999 .

[29]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[30]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[31]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[32]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[33]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[34]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[36]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[37]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[38]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[39]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[40]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[41]  Hendrik Van Brussel,et al.  Classifier Fusion Using Discounted Dempster-Shafer Combination , 2007, MLDM Posters.

[42]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[43]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[44]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[45]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[46]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[47]  H FriedmanJerome On Bias, Variance, 0/1Loss, and the Curse-of-Dimensionality , 1997 .

[48]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.