Uncovering Coresets for Classification With Multi-Objective Evolutionary Algorithms

A coreset is a subset of the training set, using which a machine learning algorithm obtains performances similar to what it would deliver if trained over the whole original data. Coreset discovery is an active and open line of research as it allows improving training speed for the algorithms and may help human understanding the results. Building on previous works, a novel approach is presented: candidate corsets are iteratively optimized, adding and removing samples. As there is an obvious trade-off between limiting training size and quality of the results, a multi-objective evolutionary algorithm is used to minimize simultaneously the number of points in the set and the classification error. Experimental results on non-trivial benchmarks show that the proposed approach is able to deliver results that allow a classifier to obtain lower error and better ability of generalizing on unseen data than state-of-the-art coreset discovery techniques.

[1]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.

[2]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[3]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[4]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[5]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[6]  Andreas Krause,et al.  Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[7]  Pietro Barbiero,et al.  Making Sense of Economics Datasets with Evolutionary Coresets , 2019 .

[8]  R. Stolzenberg,et al.  Multiple Regression Analysis , 2004 .

[9]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[10]  Trevor Campbell,et al.  Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent , 2018, ICML.

[11]  Kenneth A. De Jong,et al.  Genetic algorithms as a tool for feature selection in machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[12]  Nancy Chinchor,et al.  MUC-3 evaluation metrics , 1991, MUC.

[13]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[14]  Mario Bertero,et al.  The Stability of Inverse Problems , 1980 .

[15]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Giovanni Squillero,et al.  Evolutionary discovery of coresets for classification , 2019, GECCO.

[18]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[19]  Venkat Reddy Konasani,et al.  Multiple Regression Analysis , 2015 .

[20]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[21]  Vittorio Maniezzo,et al.  Genetic evolution of the topology and weight distribution of neural networks , 1994, IEEE Trans. Neural Networks.

[22]  Peter J. Angeline,et al.  An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[23]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[24]  Trevor Campbell,et al.  Sparse Variational Inference: Bayesian Coresets from Scratch , 2019, NeurIPS.