论文信息 - TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning

TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning

As data science becomes increasingly mainstream, there will be an ever-growing demand for data science tools that are more accessible, flexible, and scalable. In response to this demand, automated machine learning (AutoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this chapter we present TPOT v0.3, an open source genetic programming-based AutoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification task. We benchmark TPOT on a series of 150 supervised classification tasks and find that it significantly outperforms a basic machine learning analysis in 21 of them, while experiencing minimal degradation in accuracy on 4 of the benchmarks—all without any domain knowledge nor human input. As such, genetic programming-based AutoML systems show considerable promise in the AutoML domain.

Randal S. Olson | Jason H. Moore | J. Moore

[1] Matthias Reif. A Comprehensive Dataset for Evaluating Approaches of Various Meta-learning Tasks , 2012, ICPRAM.

[2] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3] Kalyanmoy Deb,et al. A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[4] Marc Parizeau,et al. DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[5] C. Chui,et al. Article in Press Applied and Computational Harmonic Analysis a Randomized Algorithm for the Decomposition of Matrices , 2022 .

[6] Wolfgang Banzhaf,et al. Genetic Programming: An Introduction , 1997 .

[7] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8] Jason H. Moore,et al. An Expert Knowledge-Guided Mutation Operator for Genome-Wide Genetic Analysis Using Genetic Programming , 2007, PRIB.

[9] Mark Johnston,et al. Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[10] Randal S. Olson,et al. PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[11] P. Simon. Too Big to Ignore: The Business Case for Big Data , 2013 .