A practical framework for adaptive metaheuristics

Local search methods are useful tools for tackling hard problems such as many combinatorial optimization problems (COP). Experience from mathematics has shown that exploiting regularities in problem solving is beneficial. Consequently, identifying and exploiting regularities in the context of local search methods is deemed to be desirable, too. Due to the complexity of the COPs tackled, regularities might better be detected and learned automatically. This can be achieved by means of machine learning techniques to extend existing local search methods. Learning requires feedback, but in the context of local search methods, instructive feedback is not available. Instead, evaluative feedback can be derived from the cost function of COPs evaluating single solutions, for example. Reinforcement learning (RL) is a machine learning technique that only needs evaluative feedback. A particular RL method is called Q-learning. The present thesis attempts to develop learning local search methods in a general and practical manner. One possibility to enhance local search methods with learning capabilities is by using RL methods. The direct application of existing RL techniques for extending existing local search methods is enabled by the concept of a local search agent (LSA). The advancement of a trajectory-based local search method can be regarded as the interaction of a virtual agent whose states basically consist of solutions and whose actions are composed of arbitrary hierarchical compositions of local search operators. The resulting LSA using RL can then be called a learning LSA. The changes in cost for each move of a learning LSA can be used as reward. Based on these, returns can be computed such that maximizing the return reflects the goal of finding a global or a very good local optimum. The hierarchical structure of LSA actions allows to use so-called ILS-actions. ILS-actions coincide with the application of one iteration of the well-known Iterated Local Search (ILS) metaheuristic. The advantage of this metaheuristic and this kind of action is that only solutions from the subset of local optima -- which must contain any acceptable solution -- are considered and thus introduces a search space abstraction which in turn can improve performance. A learning LSA that employs ILS-actions iteratively will visit local optima in a guided and adaptive manner. The resulting theoretical framework is called Guided Adaptive Iterated Local Search (GAILS). In order to evaluate randomized GAILS algorithms, empirical experiments have to be conducted. Each GAILS algorithm thereby consists of three, mainly independent parts. The first part comprises the actions of a learning LSA which are specific to a problem type. The LSA actions being arbitrary hierarchical compositions of local search operators are implemented through basic local search operators. The second part represents the RL techniques used, which in turn transparently use actions and hence are problem type independent. The third part consists of the function approximators used by RL techniques. The function approximators only require as input a vector of real-valued features and this way are independent from the first two parts. Empirical experiments can be supported by providing a framework that can decouple these three main parts in any GAILS algorithm program instantiation, thus allowing for an arbitrary reuse and combination enabling rapid prototyping. The GAILS implementation framework is such an application framework which is designed to rapidly implement learning LSAs reflecting the separation of a learning LSA into its three main parts. It provides generic interfaces between components of the three parts and this way provides for a separation of problem type specific states from search control. It also provides for a separation of search control from the state of the search control unit. Hierarchically built actions are mapped to object hierarchies. Two GAILS algorithms according to Q-learning algorithm Q(0) and Q(λ) that are based on ILS-actions were developed, built and compared to corresponding standard implementations of the ILS metaheuristic. These so-called Q-ILS algorithms were tested for two problems using different function approximators. The results showed that learning useful policies and transfer of what was learned across multiple problem instances, even of different sizes, is possible and useful.

[1]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[2]  Colin R. Reeves,et al.  Genetic Algorithms—Principles and Perspectives , 2002, Operations Research/Computer Science Interfaces Series.

[3]  Nenad Mladenović,et al.  An Introduction to Variable Neighborhood Search , 1997 .

[4]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[5]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[6]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[7]  Holger H. Hoos,et al.  An Improved Ant Colony Optimisation Algorithm for the 2D HP Protein Folding Problem , 2003, Canadian Conference on AI.

[8]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[9]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10]  William F. Punch,et al.  Global search in combinatorial optimization using reinforcement learning algorithms , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[11]  D. Kell,et al.  Explanatory optimization of protein mass spectrometry via genetic search. , 2003, Analytical chemistry.

[12]  Ivar Jacobson,et al.  The unified modeling language reference manual , 2010 .

[13]  Celso C. Ribeiro,et al.  Reactive GRASP: An Application to a Matrix Decomposition Problem in TDMA Traffic Assignment , 2000, INFORMS J. Comput..

[14]  Enrique Alba,et al.  Training Neural Networks with GA Hybrid Algorithms , 2004, GECCO.

[15]  Chris N. Potts,et al.  A comparison of local search methods for flow shop scheduling , 1996, Ann. Oper. Res..

[16]  S. Baluja,et al.  Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space , 1997 .

[17]  Gilbert Laporte,et al.  Examination Timetabling: Algorithmic Strategies and Applications , 1994 .

[18]  Parasuram Anantharam Programming ruby , 2001, SOEN.

[19]  Cesare Alippi,et al.  Genetic-algorithm programming environments , 1994, Computer.

[20]  Luca Di Gaspero,et al.  EASYLOCAL++: an object‐oriented framework for the flexible design of local‐search algorithms , 2003, Softw. Pract. Exp..

[21]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[22]  Thomas Jansen,et al.  Optimization with randomized search heuristics - the (A)NFL theorem, realistic scenarios, and difficult functions , 2002, Theor. Comput. Sci..

[23]  Heinz Mühlenbein,et al.  Evolution algorithms in combinatorial optimization , 1988, Parallel Comput..

[24]  Monte Zweben,et al.  Scheduling and rescheduling with iterative repair , 1993, IEEE Trans. Syst. Man Cybern..

[25]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[26]  Jean-Arcady Meyer,et al.  Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments , 1998 .

[27]  Tim Walters,et al.  Repair and Brood Selection in the Traveling Salesman Problem , 1998, PPSN.

[28]  Andrew G. Barto,et al.  Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts , 2002, Machine Learning.

[29]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[30]  Joshua D. Knowles,et al.  ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems , 2006, IEEE Transactions on Evolutionary Computation.

[31]  Thomas G. Dietterich,et al.  Efficient Value Function Approximation Using Regression Trees , 1999 .

[32]  Shumeet Baluja,et al.  Fast Probabilistic Modeling for Combinatorial Optimization , 1998, AAAI/IAAI.

[33]  Holger H. Hoos,et al.  On the Run-time Behaviour of Stochastic Local Search Algorithms for SAT , 1999, AAAI/IAAI.

[34]  David E. Goldberg,et al.  A Survey of Optimization by Building and Using Probabilistic Models , 2002, Comput. Optim. Appl..

[35]  Jun Gu,et al.  Algorithms for the satisfiability (SAT) problem: A survey , 1996, Satisfiability Problem: Theory and Applications.

[36]  Olivier C. Martin,et al.  Partitioning of unstructured meshes for load balancing , 1995, Concurr. Pract. Exp..

[37]  Manuel Laguna,et al.  Fine-Tuning of Algorithms Using Fractional Experimental Designs and Local Search , 2006, Oper. Res..

[38]  Douglas C. Schmidt,et al.  Implementing application frameworks: object-oriented frameworks at work , 1999 .

[39]  Hoong Chuin Lau,et al.  A Generic Object-Oriented Tabu Search Framework , 2005 .

[40]  J. Walrand,et al.  Distributed Dynamic Programming , 2022 .

[41]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[42]  Schloss Birlinghoven Evolution in Time and Space -the Parallel Genetic Algorithm , 1991 .

[43]  Maurizio Lenzerini,et al.  LOCAL++: a C++ framework for local search algorithms , 1999, Proceedings Technology of Object-Oriented Languages and Systems. TOOLS 29 (Cat. No.PR00275).

[44]  François Laburthe,et al.  SALSA: A Language for Search Algorithms , 1998, Constraints.

[45]  G. R. Schreiber,et al.  Cut Size Statistics of Graph Bisection Heuristics , 1999, SIAM J. Optim..

[46]  John N. Tsitsiklis,et al.  Rollout Algorithms for Combinatorial Optimization , 1997, J. Heuristics.

[47]  Bart Selman,et al.  Noise Strategies for Improving Local Search , 1994, AAAI.

[48]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[49]  Andrew W. Moore,et al.  Learning Evaluation Functions to Improve Optimization by Local Search , 2001, J. Mach. Learn. Res..

[50]  Steven Skiena,et al.  An Improved Time-Sensitive Metaheuristic Framework for Combinatorial Optimization , 2004, WEA.

[51]  José L. Verdegay,et al.  A Fuzzy Valuation-Based Local Search Framework for Combinatorial Problems , 2002, Fuzzy Optim. Decis. Mak..

[52]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[53]  Pattie Maes,et al.  Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[54]  V. Deineko,et al.  The Quadratic Assignment Problem: Theory and Algorithms , 1998 .

[55]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[56]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[57]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[58]  Pascal Van Hentenryck,et al.  Localizer , 2004, Constraints.

[59]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[60]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[61]  Andrew W. Moore,et al.  Direct Policy Search using Paired Statistical Tests , 2001, ICML.

[62]  R. H. Myers Classical and modern regression with applications , 1986 .

[63]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[64]  Andrea Roli,et al.  MAGMA: a multiagent architecture for metaheuristics , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[65]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[66]  Richard E. Sweet The Mesa programming environment , 1985, SLIPE '85.

[67]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[68]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[69]  Luca Maria Gambardella,et al.  Ant Algorithms for Discrete Optimization , 1999, Artificial Life.

[70]  Wolfgang Bibel Wissensrepräsentation und Inferenz , 1993 .

[71]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[72]  Pascal Van Hentenryck,et al.  Localizer++: An Open Library for local Search , 2001 .

[73]  Martin Stuart Jones An object-oriented framework for the implementation of search techniques , 2000 .

[74]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[75]  ATSPDavid S. JohnsonAT Experimental Analysis of Heuristics for the Stsp , 2001 .

[76]  Holger H. Hoos,et al.  Characterising the behaviour of stochastic local search , 1999 .

[77]  Jon Jouis Bentley,et al.  Fast Algorithms for Geometric Traveling Salesman Problems , 1992, INFORMS J. Comput..

[78]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[79]  Dimitri P. Bertsekas,et al.  Distributed asynchronous computation of fixed points , 1983, Math. Program..

[80]  Pascal Van Hentenryck,et al.  Localizer: A Modeling Language for Local Search , 1999, INFORMS J. Comput..

[81]  Henry A. Kautz,et al.  Auto-Walksat: A Self-Tuning Implementation of Walksat , 2001, Electron. Notes Discret. Math..

[82]  Richard S. Sutton,et al.  Learning Instance-Independent Value Functions to Enhance Local Search , 1998, NIPS.

[83]  Fernando Guerrero,et al.  FOM: A Framework for Metaheuristic Optimization , 2003, International Conference on Computational Science.

[84]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[85]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[86]  Ivar Jacobson,et al.  The Unified Modeling Language User Guide , 1998, J. Database Manag..

[87]  Pascal Van Hentenryck The OPL optimization programming language , 1999 .

[88]  Edward W. Felten,et al.  Large-Step Markov Chains for the Traveling Salesman Problem , 1991, Complex Syst..

[89]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[90]  Arnaud Berny Selection and Reinforcement Learning for Combinatorial Optimization , 2000, PPSN.

[91]  Raphaël Dorne,et al.  HSF: the iOpt's framework to easily design metaheuristic methods , 2004 .

[92]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[93]  Helena Ramalhinho Dias Lourenço,et al.  Iterated Local Search , 2001, Handbook of Metaheuristics.

[94]  Jonathan Gratch,et al.  Adaptive Problem-solving for Large-scale Scheduling Problems: A Case Study , 1996, J. Artif. Intell. Res..

[95]  Thomas Stützle,et al.  Local Search Algorithms for SAT: An Empirical Evaluation , 2000, Journal of Automated Reasoning.

[96]  Richard M. Everson,et al.  Controlling Genetic Algorithms With Reinforcement Learning , 2002, GECCO.

[97]  William E. Hart,et al.  A Comparison of Global and Local Search Methods in Drug Docking , 1997, ICGA.

[98]  L. Torgo,et al.  Inductive learning of tree-based regression models , 1999 .

[99]  Kan Chen Simple learning algorithm for the traveling salesman problem , 1996, adap-org/9608003.

[100]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[101]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[102]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[103]  Roberto Battiti,et al.  Reactive Local Search for the Maximum Clique Problem1 , 2001, Algorithmica.

[104]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[105]  Steven Halim,et al.  A development framework for rapid meta-heuristics hybridization , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[106]  David L. Woodruff,et al.  Metaheuristic Class Libraries , 2003, Handbook of Metaheuristics.

[107]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.

[108]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[109]  Huan Liu,et al.  Handling concept drifts in incremental learning with support vector machines , 1999, KDD '99.

[110]  Peter F. Stadler,et al.  Towards a theory of landscapes , 1995 .

[111]  Raphaël Dorne,et al.  iOpt: A Software Toolkit for Heuristic Search Methods , 2001, CP.

[112]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[113]  Mauro Birattari,et al.  The problem of tuning metaheuristics: as seen from the machine learning perspective , 2004 .

[114]  Thomas Stützle,et al.  SATLIB: An Online Resource for Research on SAT , 2000 .

[115]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[116]  Gilbert Laporte,et al.  Metaheuristics: A bibliography , 1996, Ann. Oper. Res..

[117]  David M. Stein,et al.  An Asymptotic, Probabilistic Analysis of a Routing Problem , 1978, Math. Oper. Res..

[118]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[119]  Stefan Voß,et al.  Hotframe: A Heuristic Optimization Framework , 2003 .

[120]  Kenneth Dean Boese,et al.  Models for iterative global optimization , 1996 .

[121]  Martin A. Riedmiller,et al.  Speeding-up Reinforcement Learning with Multi-step Actions , 2002, ICANN.

[122]  Pierre Hansen,et al.  First improvement may be better than best improvment: an empirical study , 2005 .

[123]  Holger H. Hoos,et al.  Stochastic Local Search-Methods , 1998 .

[124]  Jim Smith,et al.  Operator and parameter adaptation in genetic algorithms , 1997, Soft Comput..

[125]  Thomas Stützle,et al.  Evaluating Las Vegas Algorithms: Pitfalls and Remedies , 1998, UAI.

[126]  C. Atkeson,et al.  Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[127]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[128]  In-Beum Lee,et al.  Nonlinear regression using RBFN with linear submodels , 2003 .

[129]  Luca Di Gaspero,et al.  A case-study for EasyLocal++: the Course Timetabling Problem , 2005 .

[130]  Mauro Brunato,et al.  Reactive Search , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[131]  Gerhard Reinelt,et al.  TSPLIB - A Traveling Salesman Problem Library , 1991, INFORMS J. Comput..

[132]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[133]  Andrea Schaerf,et al.  A Survey of Automated Timetabling , 1999, Artificial Intelligence Review.

[134]  Andrew W. Moore,et al.  Value Function Based Production Scheduling , 1998, ICML.

[135]  Pablo Moscato,et al.  Memetic algorithms: a short introduction , 1999 .

[136]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[137]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[138]  Shie Mannor,et al.  Sparse Online Greedy Support Vector Regression , 2002, ECML.

[139]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[140]  G. Clarke,et al.  Scheduling of Vehicles from a Central Depot to a Number of Delivery Points , 1964 .

[141]  G. Rand Sequencing and Scheduling: An Introduction to the Mathematics of the Job-Shop , 1982 .

[142]  Andrew W. Moore,et al.  Learning Evaluation Functions for Global Optimization and Boolean Satisfiability , 1998, AAAI/IAAI.

[143]  Thomas Stützle,et al.  Local search algorithms for combinatorial problems - analysis, improvements, and new applications , 1999, DISKI.

[144]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[145]  Toby Walsh,et al.  Local Search and the Number of Solutions , 1996, CP.

[146]  Keith D. Cooper,et al.  Improvements to graph coloring register allocation , 1994, TOPL.

[147]  Panagiotis Stamatopoulos,et al.  Guiding Constructive Search with Statistical Instance-Based Learning , 2002, Int. J. Artif. Intell. Tools.

[148]  José L. Verdegay,et al.  Applying a fuzzy sets‐based heuristic to the protein structure prediction problem , 2002, Int. J. Intell. Syst..

[149]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[150]  E. Polak,et al.  On Multicriteria Optimization , 1976 .

[152]  M. R. Rao,et al.  Combinatorial Optimization , 1992, NATO ASI Series.

[153]  A. J. Clewett,et al.  Introduction to sequencing and scheduling , 1974 .

[154]  Roberto Battiti,et al.  The Reactive Tabu Search , 1994, INFORMS J. Comput..

[155]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[156]  Hans-Georg Beyer,et al.  Toward a Theory of Evolution Strategies: Self-Adaptation , 1995, Evolutionary Computation.

[157]  Mauricio G. C. Resende,et al.  Designing and reporting on computational experiments with heuristic methods , 1995, J. Heuristics.

[158]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[159]  Dimitri P. Bertsekas,et al.  Rollout Algorithms for Stochastic Scheduling Problems , 1999, J. Heuristics.

[160]  Joshua D. Knowles,et al.  Multiobjective Optimization on a Budget of 250 Evaluations , 2005, EMO.

[161]  Jeff G. Schneider,et al.  Covariant policy search , 2003, IJCAI 2003.

[162]  Pierre Hansen,et al.  Variable neighborhood search: Principles and applications , 1998, Eur. J. Oper. Res..

[163]  J. Stevens,et al.  Animal Intelligence , 1883, Nature.

[164]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[165]  Erwin Pesch,et al.  Genetic Local Search in Combinatorial Optimization , 1994, Discret. Appl. Math..

[166]  D. Atkin OR scheduling algorithms. , 2000, Anesthesiology.

[167]  Edward P. K. Tsang,et al.  Applying an Extended Guided Local Search to the Quadratic Assignment Problem , 2003, Ann. Oper. Res..

[168]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[169]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[170]  Christos Voudouris,et al.  Integrating Heuristic Search and One-Way Constraints in the Iopt Toolkit , 2003 .

[171]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[172]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[173]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[174]  Christian Blum,et al.  Metaheuristics in combinatorial optimization: Overview and conceptual comparison , 2003, CSUR.

[175]  Thomas Stützle,et al.  Search Space Analysis of the Linear Ordering Problem , 2003, EvoWorkshops.

[176]  J. Eliot B. Moss,et al.  Scheduling Straight-Line Code Using Reinforcement Learning and Rollouts , 1998, NIPS.

[177]  Sridhar Mahadevan,et al.  Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.

[178]  Edward P. K. Tsang,et al.  Guided Local Search for Solving SAT and Weighted MAX-SAT Problems , 2000, Journal of Automated Reasoning.

[179]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[180]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[181]  Pascal Van Hentenryck,et al.  A constraint-based architecture for local search , 2002, OOPSLA '02.

[182]  W. H. Payne,et al.  Coding the Lehmer pseudo-random number generator , 1969, CACM.

[183]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[184]  Thomas Stützle,et al.  MAX-MIN Ant System , 2000, Future Gener. Comput. Syst..

[185]  Terry Jones,et al.  Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms , 1995, ICGA.

[186]  Wolfgang Bibel,et al.  Deduction - automated logic , 1993 .

[187]  Andrew B. Kahng,et al.  A new adaptive multi-start technique for combinatorial global optimizations , 1994, Oper. Res. Lett..

[188]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[189]  Holger H. Hoos,et al.  An adaptive noise mechanism for walkSAT , 2002, AAAI/IAAI.

[190]  J. Stapleton Introduction to Probability Theory and Statistical Inference , 1970 .

[191]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[192]  Thomas Stützle,et al.  A beginner's introduction to iterated local search , 2001 .

[193]  Thomas Stützle,et al.  A Racing Algorithm for Configuring Metaheuristics , 2002, GECCO.

[194]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[195]  Douglas C. Schmidt,et al.  Object-oriented application frameworks , 1997, CACM.

[196]  Thomas Stützle,et al.  Local search and metaheuristics for the quadratic assignment problem , 2001 .

[197]  Pierre Hansen,et al.  Variable Neighborhood Search , 2018, Handbook of Heuristics.

[198]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[199]  Ravi Sethi,et al.  The Complexity of Flowshop and Jobshop Scheduling , 1976, Math. Oper. Res..

[200]  Michael Sampels,et al.  Metaheuristics for Group Shop Scheduling , 2002, PPSN.

[201]  Mark D. Reid,et al.  Learning to Fly: An Application of Hierarchical Reinforcement Learning , 2000, ICML.

[202]  David B. Fogel,et al.  Evolutionary algorithms in theory and practice , 1997, Complex.

[203]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[204]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[205]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[206]  Thomas Stützle,et al.  Analysing the Run-Time Behaviour of Iterated Local Search for the Travelling Salesman Problem , 2002 .

[207]  Wei Zhang,et al.  Reinforcement learning for job shop scheduling , 1996 .

[208]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[209]  Alexander Nareyek,et al.  Choosing search heuristics by non-stationary reinforcement learning , 2004 .

[210]  Laurent Mi OPL++: A Modeling Layer for Constraint Programming Libraries , 2000 .

[211]  Sridhar Mahadevan,et al.  Hierarchical Policy Gradient Algorithms , 2003, ICML.

[212]  E. Weinberger,et al.  Correlated and uncorrelated fitness landscapes and how to tell the difference , 1990, Biological Cybernetics.

[213]  Bart Selman,et al.  Evidence for Invariants in Local Search , 1997, AAAI/IAAI.

[214]  Pedro Larrañaga,et al.  Combinatonal Optimization by Learning and Simulation of Bayesian Networks , 2000, UAI.

[215]  Marco Dorigo,et al.  The ant colony optimization meta-heuristic , 1999 .

[216]  P. Coveney,et al.  Combinatorial searches of inorganic materials using the ink-jet printer: science, philosophy and technology , 2001 .

[217]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[218]  Andrew W. Moore,et al.  The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[219]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[220]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[221]  Celso C. Ribeiro,et al.  An object-oriented framework for local search heuristics , 1998, Proceedings. Technology of Object-Oriented Languages. TOOLS 26 (Cat. No.98EX176).

[222]  Mauricio G. C. Resende,et al.  Grasp: An Annotated Bibliography , 2002 .

[223]  Bryant A. Julstrom,et al.  What Have You Done for Me Lately? Adapting Operator Probabilities in a Steady-State Genetic Algorithm , 1995, ICGA.

[224]  Timothy X. Brown,et al.  Switch Packet Arbitration via Queue-Learning , 2001, NIPS.

[225]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[226]  Bernd Freisleben,et al.  Fitness landscapes and memetic algorithm design , 1999 .

[227]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.