Learning with few examples: An empirical study on leading classifiers

Learning algorithms proved their ability to deal with large amount of data. Most of the statistical approaches use defined size learning sets and produce static models. However in specific situations: active or incremental learning, the learning task starts with only very few data. In that case, looking for algorithms able to produce models with only few examples becomes necessary. The literature's classifiers are generally evaluated with criterion such as: accuracy, ability to order data (ranking)... But this classifiers' taxonomy can dramatically change if the focus is on the ability to learn with just few examples. To our knowledge, just few studies were performed on this problem. The study presented in this paper aims to study a larger panel of both algorithms (9 different kinds) and data sets (17 UCI bases).

[1]  Yoshihiko Hamamoto,et al.  Improvement of the Parzen classifier in small training sample size situations , 2001, Intell. Data Anal..

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[4]  Dirk Van den Poel,et al.  Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB , 2007, DEXA.

[5]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[6]  Marc Boullé,et al.  A Grouping Method for Categorical Attributes Having Very Large Number of Values , 2005, MLDM.

[7]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[8]  Marc Boullé,et al.  MODL: A Bayes optimal discretization method for continuous attributes , 2006, Machine Learning.

[9]  João Gama,et al.  Accurate decision trees for mining high-speed data streams , 2003, KDD '03.

[10]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[11]  Marc Boullé Regularization and Averaging of the Selective Naïve Bayes classifier , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[12]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[13]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[14]  H. Altay Güvenir,et al.  Classification by Voting Feature Intervals , 1997, ECML.

[15]  Isabelle Guyon,et al.  Design and analysis of the KDD cup 2009: fast scoring on a large orange customer database , 2009, SKDD.

[16]  Guillaume Bouchard,et al.  The Tradeoff Between Generative and Discriminative Classifiers , 2004 .

[17]  Isabelle Guyon,et al.  Results of the Active Learning Challenge , 2011, Active Learning and Experimental Design @ AISTATS.

[18]  George Forman,et al.  Learning from Little: Comparison of Classifiers Given Little Training , 2004, PKDD.

[19]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[20]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[21]  Raphaël Féraud,et al.  The Orange Customer Analysis Platform , 2010, ICDM.

[22]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[23]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[24]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  João Gama,et al.  Learning decision trees from dynamic data streams , 2005, SAC '05.

[28]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[29]  Yoshihiko Hamamoto,et al.  A Bootstrap Technique for Nearest Neighbor Classifier Design , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[31]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[32]  Remco R. Bouckaert,et al.  Bayesian network classifiers in Weka , 2004 .

[33]  Robert P. W. Duin,et al.  Stabilizing classifiers for very small sample sizes , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[34]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[35]  Marc Boullé,et al.  Khiops: A Statistical Discretization Method of Continuous Attributes , 2004, Machine Learning.

[36]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[37]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[38]  Anant Madabhushi,et al.  Predicting classifier performance with a small training set: Applications to computer-aided diagnosis and prognosis , 2010, 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  Geoffrey I. Webb,et al.  On the effect of data set size on bias and variance in classification learning , 1999 .