Predict or screen your expensive assay: DoE vs. surrogates in experimental combinatorial optimization

Statistics-based Design of Experiments (DoE) methodologies are considered the gold standard in existing laboratory equipment for screening predefined experimental assays. Thus far, very little is formally known about their effectiveness in light of global optimization, particularly when compared to Evolutionary Algorithms (EAs). The current study, which was ignited by evolution-in-the-loop of functional protein expression, aims to conduct such a comparison with a focus on Combinatorial Optimization considering a dozen of decision variables, a search-space cardinality of several million combinations, under a budget of a couple of thousands of function evaluations. Due to the limited budget of evaluations, we argue that surrogate-assisted search methods become relevant for this application domain. To this end, we study a specific so-called Categorical Evolution Strategy (CatES). We systematically compare its performance, with and without being assisted by state-of-the-art surrogates, to DoE-based initialization of the search (exploiting the budget partially or entirely). Our empirical findings on noise-free benchmarks show that the surrogate-based approach is superior, as it significantly outperforms the DoE techniques and the CatES alone on the majority of the problems subject to the budget constraint. We conclude by projecting the strengths and weaknesses of EAs versus DoE, when run either directly or surrogate-aided.

[1]  Živorad R. Lazić Design of Experiments in Chemical Engineering: A Practical Guide , 2004 .

[2]  Robert L. Mason,et al.  Fractional factorial design , 2009 .

[3]  David E. Clark,et al.  Evolutionary Algorithms in Molecular Design , 1999 .

[4]  Joshua D. Knowles,et al.  Closed-loop, multiobjective optimization of analytical instrumentation: gas chromatography/time-of-flight mass spectrometry of the metabolomes of human serum and of yeast fermentations. , 2005, Analytical chemistry.

[5]  Carlos Cotta,et al.  Recent Advances in Evolutionary Computation for Combinatorial Optimization , 2008, Recent Advances in Evolutionary Computation for Combinatorial Optimization.

[6]  I. A. Pasha,et al.  Bi-alphabetic pulse compression radar signal design , 2000 .

[7]  Hao Wang,et al.  IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics , 2018, ArXiv.

[8]  Douglas B Kell,et al.  Scientific discovery as a combinatorial optimisation problem: How best to navigate the landscape of possible experiments? , 2012, BioEssays : news and reviews in molecular, cellular and developmental biology.

[9]  R. Plackett,et al.  THE DESIGN OF OPTIMUM MULTIFACTORIAL EXPERIMENTS , 1946 .

[10]  Stephan Mertens,et al.  Low autocorrelation binary sequences , 2015, 1512.02475.

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Thomas Bäck,et al.  Mixed-integer evolution strategy using multiobjective selection applied to warehouse design optimization , 2010, GECCO '10.

[13]  C. Wandrey,et al.  Medium Optimization by Genetic Algorithm for Continuous Production of Formate Dehydrogenase , 1995 .

[14]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[15]  D. Kell,et al.  Array-based evolution of DNA aptamers allows modelling of an explicit sequence-fitness landscape , 2008, Nucleic acids research.

[16]  Xin Yao,et al.  Recent Advances in Simulated Evolution and Learning [extended and revised papers selected from the 4th Asia-Pacific Conference on Simulated Evolution and Learning, SEAL 2002, 18-22 November 2002, Singapore] , 2004, SEAL.

[17]  Carola Doerr,et al.  Constructing low star discrepancy point sets with genetic algorithms , 2013, GECCO '13.

[18]  Jacob D. Feala,et al.  Search Algorithms as a Framework for the Optimization of Drug Combinations , 2008, PLoS Comput. Biol..

[19]  Thomas Bäck,et al.  Evolutionary Algorithms in Theory and Practice , 1996 .

[20]  Serge Kernbach,et al.  Embodied artificial evolution , 2012, Evolutionary Intelligence.

[21]  Ofer M. Shir,et al.  Quantum control experiments as a testbed for evolutionary multi-objective algorithms , 2012, Genetic Programming and Evolvable Machines.

[22]  M. S. Khots,et al.  D-optimal designs , 1995 .

[23]  Ofer M. Shir,et al.  Compiling a benchmarking test-suite for combinatorial black-box optimization: a position paper , 2018, GECCO.

[24]  J. S. Hunter,et al.  Statistics for Experimenters: Design, Innovation, and Discovery , 2006 .

[25]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[26]  Joshua D. Knowles Closed-loop evolutionary multiobjective optimization , 2009, IEEE Computational Intelligence Magazine.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Gustav Gerber,et al.  Femtosecond quantum control of molecular dynamics in the condensed phase. , 2007, Physical chemistry chemical physics : PCCP.

[29]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[30]  Ingo Rechenberg,et al.  Case studies in evolutionary experimentation and computation , 2000 .

[31]  Hans-Paul Schwefel,et al.  Evolution and optimum seeking , 1995, Sixth-generation computer technology series.

[32]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[33]  Bryan A. Tolson,et al.  Review of surrogate modeling in water resources , 2012 .

[34]  Thomas Bäck,et al.  An evolutionary heuristic for the maximum independent set problem , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[35]  D. Block,et al.  Development of Chemically Defined Media Supporting High-Cell-Density Growth of Lactococci, Enterococci, and Streptococci , 2008, Applied and Environmental Microbiology.

[36]  Joshua D. Knowles,et al.  Efficient discovery of anti-inflammatory small molecule combinations using evolutionary computing , 2011, Nature chemical biology.

[37]  Ofer M. Shir,et al.  Accelerated optimization and automated discovery with covariance matrix adaptation for experimental quantum control , 2009 .

[38]  L. D. Whitley,et al.  Efficient retrieval of landscape Hessian: forced optimal covariance adaptive learning. , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Dirk Sudholt,et al.  Crossover is provably essential for the Ising model on trees , 2005, GECCO '05.

[40]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[41]  Dieter Beule,et al.  Evolutionary search for low autocorrelated binary sequences , 1998, IEEE Trans. Evol. Comput..

[42]  Thomas Bäck,et al.  Contemporary Evolution Strategies , 2013, Natural Computing Series.

[43]  Budiman Minasny,et al.  A conditioned Latin hypercube method for sampling in the presence of ancillary information , 2006, Comput. Geosci..

[44]  Brett Stevens,et al.  A survey of known results and research areas for n-queens , 2009, Discret. Math..

[45]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[46]  Thomas Bartz-Beielstein,et al.  Efficient global optimization for combinatorial problems , 2014, GECCO.

[47]  L. Gold,et al.  Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. , 1990, Science.

[48]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[49]  George E. P. Box,et al.  Evolutionary Operation: a Method for Increasing Industrial Productivity , 1957 .

[50]  Michael T. M. Emmerich,et al.  Multi-Objective Evolutionary Design of Adenosine Receptor Ligands , 2012, J. Chem. Inf. Model..

[51]  A. E. Eiben,et al.  From evolutionary computation to the evolution of things , 2015, Nature.