A Response Surface Model Approach to Parameter Estimation of Reinforcement Learning for the Travelling Salesman Problem

This paper reports the use of response surface model (RSM) and reinforcement learning (RL) to solve the travelling salesman problem (TSP). In contrast to heuristically approaches to estimate the parameters of RL, the method proposed here allows a systematic estimation of the learning rate and the discount factor parameters.The Q-learning and SARSA algorithms were applied to standard problems from the TSPLIB library. Computational results demonstrate that the use of RSM is capable of producing better solutions to both symmetric and asymmetric tests of TSP.

[1]  William J. Cook,et al.  In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation , 2011 .

[2]  Naoto Yoshida,et al.  Reinforcement learning with state-dependent discount factor , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[3]  Magdalene Marinaki,et al.  A Hybrid Multi-Swarm Particle Swarm Optimization algorithm for the Probabilistic Traveling Salesman Problem , 2010, Comput. Oper. Res..

[4]  Lane Maria Rabelo Baccarini,et al.  Three-Phase Induction Motors Faults Recognition and Classification Using Neural Networks and Response Surface Models , 2014 .

[5]  Gerhard Reinelt,et al.  TSPLIB - A Traveling Salesman Problem Library , 1991, INFORMS J. Comput..

[6]  Adrião Duarte Dória Neto,et al.  Hybrid Metaheuristics Using Reinforcement Learning Applied to Salesman Traveling Problem , 2010 .

[7]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[8]  André Luiz Carvalho Ottoni,et al.  Análise da influência da taxa de aprendizado e do fator de desconto sobre o desempenho dos algoritmos Q-learning e SARSA: aplicação do aprendizado por reforço na navegação autônoma , 2016 .

[9]  S. Billings Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains , 2013 .

[10]  Raul H. C. Lopes,et al.  Pengaruh Latihan Small Sided Games 4 Lawan 4 Dengan Maksimal Tiga Sentuhan Terhadap Peningkatan VO2MAX Pada Siswa SSB Tunas Muda Bragang Klampis U-15 , 2022, Jurnal Ilmiah Mandala Education.

[11]  Adrião Duarte Dória Neto,et al.  A parallel hybrid implementation using genetic algorithm, GRASP and reinforcement learning , 2009, 2009 International Joint Conference on Neural Networks.

[12]  Pablo Moscato,et al.  A New Memetic Algorithm for the Asymmetric Traveling Salesman Problem , 2004, J. Heuristics.

[13]  Kunikazu Kobayashi,et al.  A Meta-learning Method Based on Temporal Difference Error , 2009, ICONIP.

[14]  Claude-Nicolas Fiechter,et al.  A Parallel Tabu Search Algorithm for Large Traveling Salesman Problems , 1994, Discret. Appl. Math..

[15]  Adrião Duarte Dória Neto,et al.  Reactive Search strategies using Reinforcement Learning, local search algorithms and Variable Neighborhood Search , 2014, Expert Syst. Appl..

[16]  Tuncay Erzurumlu,et al.  Comparison of response surface model with neural network in determining the surface quality of moulded parts , 2007 .

[17]  Georgios Dounias,et al.  Honey bees mating optimization algorithm for the Euclidean traveling salesman problem , 2011, Inf. Sci..

[18]  Ruoying Sun,et al.  Multiagent reinforcement learning method with an improved ant colony system , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[19]  R. Wilcox Kolmogorov–Smirnov Test , 2005 .

[20]  Russell V. Lenth,et al.  Response-Surface Methods in R, Using rsm , 2009 .

[21]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[22]  Kazushi Murakoshi,et al.  A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes. , 2004, Bio Systems.

[23]  Shyi-Ming Chen,et al.  Solving the traveling salesman problem based on the genetic simulated annealing ant colony system with particle swarm optimization techniques , 2011, Expert Syst. Appl..

[24]  Itsuki Noda Recursive Adaptation of Stepsize Parameter for Non-stationary Environments , 2009, ALA.

[25]  Günther Palm,et al.  Meta-Learning of Exploration and Exploitation Parameters with Replacing Eligibility Traces , 2013, PSL.

[26]  Yang Liu,et al.  An Improved Genetic Algorithm with Initial Population Strategy for Symmetric TSP , 2015 .

[27]  Mir Mohammad Alipour,et al.  A new multiagent reinforcement learning algorithm to solve the symmetric traveling salesman problem , 2015, Multiagent Grid Syst..

[28]  Will Dabney,et al.  ADAPTIVE STEP-SIZES FOR REINFORCEMENT LEARNING , 2014 .

[29]  Kenji Doya,et al.  Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[30]  Douglas C. Montgomery,et al.  Response Surface Methodology: Process and Product Optimization Using Designed Experiments , 1995 .

[31]  Warren B. Powell,et al.  A New Optimal Stepsize for Approximate Dynamic Programming , 2014, IEEE Transactions on Automatic Control.

[32]  Xin-She Yang,et al.  Discrete cuckoo search algorithm for the travelling salesman problem , 2014, Neural Computing and Applications.

[33]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[34]  Rekha S. Singhal,et al.  Comparison of artificial neural network (ANN) and response surface methodology (RSM) in fermentation media optimization: Case study of fermentative production of scleroglucan , 2008 .

[35]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[36]  William J. Cook,et al.  The Traveling Salesman Problem: A Computational Study , 2007 .

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38]  Christopher J. Gatti Design of Experiments for Reinforcement Learning , 2014 .

[39]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[40]  Maria Teresinha Arns Steiner,et al.  A new approach to solve the traveling salesman problem , 2007, Neurocomputing.

[41]  Guangzhou Zeng,et al.  Study of genetic algorithm with reinforcement learning to solve the TSP , 2009, Expert Syst. Appl..