Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality

Abstract Dynamic multi-objective optimisation problem (DMOP) has brought a great challenge to the reinforcement learning (RL) research area due to its dynamic nature such as objective functions, constraints and problem parameters that may change over time. This study aims to identify the lacking in the existing benchmarks for multi-objective optimisation for the dynamic environment in the RL settings. Hence, a dynamic multi-objective testbed has been created which is a modified version of the conventional deep-sea treasure (DST) hunt testbed. This modified testbed fulfils the changing aspects of the dynamic environment in terms of the characteristics where the changes occur based on time. To the authors’ knowledge, this is the first dynamic multi-objective testbed for RL research, especially for deep reinforcement learning. In addition to that, a generic algorithm is proposed to solve the multi-objective optimisation problem in a dynamic constrained environment that maintains equilibrium by mapping different objectives simultaneously to provide the most compromised solution that closed to the true Pareto front (PF). As a proof of concept, the developed algorithm has been implemented to build an expert system for a real-world scenario using Markov decision process to identify the vulnerable zones based on water quality resilience in Sao Paulo, Brazil. The outcome of the implementation reveals that the proposed parity-Q deep Q network (PQDQN) algorithm is an efficient way to optimise the decision in a dynamic environment. Moreover, the result shows PQDQN algorithm performs better compared to the other state-of-the-art solutions both in the simulated and the real-world scenario.

[1]  Mei Liu,et al.  Support vector machine―an alternative to artificial neuron network for water quality forecasting in an agricultural nonpoint source polluted river? , 2014, Environmental Science and Pollution Research.

[2]  Jean-Baptiste Mouret,et al.  Black-box data-efficient policy search for robotics , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Daniel J. Lizotte,et al.  Multi-Objective Markov Decision Processes for Data-Driven Decision Support , 2016, J. Mach. Learn. Res..

[4]  Bogdan Filipič,et al.  A Numerical Simulator for a Crop-Producing Greenhouse , 2002 .

[5]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[6]  Zhenli He,et al.  Background concentrations and quality reference values for some potentially toxic elements in soils of São Paulo State, Brazil. , 2018, Journal of environmental management.

[7]  Thanh Thi Nguyen,et al.  A Multi-Objective Deep Reinforcement Learning Framework , 2018, Eng. Appl. Artif. Intell..

[8]  Andries Petrus Engelbrecht,et al.  Population-based metaheuristics for continuous boundary-constrained dynamic multi-objective optimisation problems , 2014, Swarm Evol. Comput..

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Hai-Bin Zhou,et al.  Simulation of water removal process and optimization of aeration strategy in sewage sludge composting. , 2014, Bioresource technology.

[11]  A. R. Slaughter,et al.  A management-oriented water quality model for data scarce catchments , 2017, Environ. Model. Softw..

[12]  Antonio Pietrabissa,et al.  A distributed multi-path algorithm for wireless ad-hoc networks based on Wardrop routing , 2013, 21st Mediterranean Conference on Control and Automation.

[13]  Richard Bellman,et al.  Dynamic Programming and Stochastic Control Processes , 1958, Inf. Control..

[14]  Marcello Restelli,et al.  A multiobjective reinforcement learning approach to water resources systems operation: Pareto frontier approximation in a single run , 2013 .

[15]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[16]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[17]  C. Coello,et al.  Improving PSO-based Multi-Objective Optimization using Crowding , Mutation and �-Dominance , 2005 .

[18]  Jian Zhou,et al.  Water quality prediction method based on LSTM neural network , 2017, 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE).

[19]  Lamjed Ben Said,et al.  A Multiple Reference Point-based evolutionary algorithm for dynamic multi-objective optimization with undetectable changes , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[20]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[21]  Kalyanmoy Deb,et al.  Dynamic multiobjective optimization problems: test cases, approximations, and applications , 2004, IEEE Transactions on Evolutionary Computation.

[22]  Jui-Sheng Chou,et al.  Determining quality of water in reservoir using machine learning , 2018, Ecol. Informatics.

[23]  Peter Vamplew,et al.  MORL-Glue: a benchmark suite for multi-objective reinforcement learning , 2017 .

[24]  Shahram Sarkani,et al.  Many-objective stochastic path finding using reinforcement learning , 2017, Expert Syst. Appl..

[25]  Günter Rudolph,et al.  Evolutionary Optimization of Dynamic Multiobjective Functions , 2006 .

[26]  Saskia Preissner,et al.  Drug Cocktail Optimization in Chemotherapy of Cancer , 2012, PloS one.

[27]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[28]  Marcus Östman,et al.  Occurrence and behaviour of 105 active pharmaceutical ingredients in sewage waters of a municipal sewer collection system. , 2014, Water research.

[29]  Bernhard Sendhoff,et al.  A Multiobjective Evolutionary Algorithm Using Gaussian Process-Based Inverse Modeling , 2015, IEEE Transactions on Evolutionary Computation.

[30]  A. Shwartz,et al.  Handbook of Markov decision processes : methods and applications , 2002 .

[31]  Nicolas Vayatis,et al.  A Machine Learning Approach to the Analysis of Wave Energy Converters , 2015 .

[32]  Michèle Sebag,et al.  Multi-objective Monte-Carlo Tree Search , 2012, ACML.

[33]  Sriraam Natarajan,et al.  Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[34]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[35]  Marcello Restelli,et al.  Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation , 2016, J. Artif. Intell. Res..

[36]  Viktor Pocajt,et al.  Modelling of dissolved oxygen in the Danube River using artificial neural networks and Monte Carlo Simulation uncertainty analysis , 2014 .

[37]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[38]  Konkoly Thege Multi-criteria Reinforcement Learning , 1998 .

[39]  Ye Tian,et al.  PlatEMO: A MATLAB Platform for Evolutionary Multi-Objective Optimization [Educational Forum] , 2017, IEEE Computational Intelligence Magazine.

[40]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[41]  Qingfu Zhang,et al.  Biased Multiobjective Optimization and Decomposition Algorithm , 2017, IEEE Transactions on Cybernetics.

[42]  Xin Yao,et al.  Mathematical modeling and multi-objective evolutionary algorithms applied to dynamic flexible job shop scheduling problems , 2015, Inf. Sci..

[43]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[44]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[45]  Lamjed Ben Said,et al.  Multi-objective Optimization with Dynamic Constraints and Objectives: New Challenges for Evolutionary Algorithms , 2015, GECCO.

[46]  Jianli Ding,et al.  Evaluation of water quality based on a machine learning algorithm and water quality index for the Ebinur Lake Watershed, China , 2017, Scientific Reports.

[47]  Viviana Cocco Mariani,et al.  Multi-objective optimization of the environmental-economic dispatch with reinforcement learning based on non-dominated sorting genetic algorithm , 2019, Applied Thermal Engineering.

[48]  Pericles A. Mitkas,et al.  Reinforcement Learning based scheduling in a workflow management system , 2019, Eng. Appl. Artif. Intell..

[49]  HA RAIMOP. A Dynamic Interval Goal Programming Approach to the Regulation of a Lake – River System , 2001 .

[50]  Mirjana Čurlin,et al.  Assessing the surface water status in Pannonian ecoregion by the water quality index model , 2017 .

[51]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[52]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point Based Nondominated Sorting Approach, Part II: Handling Constraints and Extending to an Adaptive Approach , 2014, IEEE Transactions on Evolutionary Computation.

[53]  Regina Barzilay,et al.  Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning , 2016, EMNLP.

[54]  Kay Chen Tan,et al.  A predictive gradient strategy for multiobjective evolutionary algorithms in a fast changing environment , 2010, Memetic Comput..

[55]  Xin Yao,et al.  Dynamic Multi-objective Optimization: A Survey of the State-of-the-Art , 2013 .

[56]  Shahbaz Mushtaq,et al.  Water policy implementation in the state of São Paulo, Brazil: Key challenges and opportunities , 2016 .

[57]  Chao Chen,et al.  Dynamic Multiobjective Optimization Algorithm Based on Average Distance Linear Prediction Model , 2014, TheScientificWorldJournal.

[58]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[59]  K. Maalawi Special Issues on Design Optimization of Wind Turbine Structures , 2011 .

[60]  Peter A. N. Bosman,et al.  Evolutionary Multiobjective Optimization for Dynamic Hospital Resource Management , 2009, EMO.

[61]  Md Mahmudul Hasan,et al.  Reversible decision support system: Minimising cognitive dissonance in multi-criteria based complex system using fuzzy analytic hierarchy process , 2016, 2016 8th Computer Science and Electronic Engineering (CEEC).

[62]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[63]  ESTADO DE SÃO PAULO,et al.  CETESB-COMPANHIA AMBIENTAL DO ESTADO DE SÃO PAULO , 2014 .

[64]  M. Alamgir Hossain,et al.  Multi-objective optimal chemotherapy control model for cancer treatment , 2010, Medical & Biological Engineering & Computing.

[65]  Susan A. Murphy,et al.  Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis , 2010, ICML.

[66]  Simon M. Lucas,et al.  Multiobjective Monte Carlo Tree Search for Real-Time Games , 2015, IEEE Transactions on Computational Intelligence and AI in Games.

[67]  Eitan Gross On the Bellman’s principle of optimality , 2016 .

[68]  Lin Zhang,et al.  Decision-Theoretic Military Operations Planning , 2004, ICAPS.

[69]  Lamjed Ben Said,et al.  A dynamic multi-objective evolutionary algorithm using a change severity-based adaptive population management strategy , 2015, Soft Computing.

[70]  V. Magaña,et al.  Urban water supply and the changes in the precipitation patterns in the metropolitan area of São Paulo – Brazil , 2018 .

[71]  Damien Ernst,et al.  Reinforcement Learning for Electric Power System Decision and Control: Past Considerations and Perspectives , 2017 .

[72]  Julio Ortega Lopera,et al.  Performance Measures for Dynamic Multi-Objective Optimization , 2009, IWANN.

[73]  B. Bhattacharya,et al.  Control of water levels of regional water systems using reinforcement learning , 2002 .

[74]  Husheng Li,et al.  Multi-objective reinforcement learning based routing in cognitive radio networks: Walking in a random maze , 2012, 2012 International Conference on Computing, Networking and Communications (ICNC).

[75]  Gabriela Narcizo de Lima,et al.  Data on the volumes of water stored in the reservoirs supplying the Metropolitan Area of Sao Paulo (2003–2015) , 2018, Data in brief.

[76]  Chunyang He,et al.  Water shortages raised a legitimate concern over the sustainable development of the drylands of northern China: Evidence from the water stress index. , 2017, The Science of the total environment.

[77]  Antonio Iodice,et al.  Sentinel-1 for Monitoring Reservoirs: A Performance Analysis , 2014, Remote. Sens..

[78]  Raimo P. Hämäläinen,et al.  Dynamic multi-objective heating optimization , 2002, Eur. J. Oper. Res..

[79]  Sirkka-Liisa Jämsä-Jounela,et al.  Modelling module of the intelligent control system for the variable volume pressure filter , 2000 .

[80]  Randy A. Dahlgren,et al.  Prediction of dissolved oxygen concentration in hypoxic river systems using support vector machine: a case study of Wen-Rui Tang River, China , 2017, Environmental Science and Pollution Research.

[81]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints , 2014, IEEE Transactions on Evolutionary Computation.

[82]  Peter Vamplew,et al.  Steering approaches to Pareto-optimal multiobjective reinforcement learning , 2017, Neurocomputing.

[83]  David Vandyke,et al.  On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[84]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[85]  Lamjed Ben Said,et al.  Dynamic Multi-objective Optimization Using Evolutionary Algorithms: A Survey , 2017, Recent Advances in Evolutionary Multi-objective Optimization.

[86]  Yang Liu,et al.  Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.

[87]  Niels Peek,et al.  Explicit temporal models for decision-theoretic planning of clinical management , 1999, Artif. Intell. Medicine.

[88]  Manuela Ruiz-Montiel,et al.  A temporal difference method for multi-objective reinforcement learning , 2017, Neurocomputing.

[89]  Paolo Amato,et al.  An ALife-Inspired Evolutionary Algorithm for Dynamic Multiobjective Optimization Problems , 2005 .

[90]  Nathan Srebro,et al.  The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.

[91]  X. Yao,et al.  Benchmark Problems for CEC2018 Competition on Dynamic Multiobjective Optimisation , 2018 .

[92]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[93]  José Neves,et al.  Water Quality Modeling using Artificial Intelligence-Based Tools , 2012 .

[94]  Günter Rudolph,et al.  Evaluation of a Multi-Objective EA on Benchmark Instances for Dynamic Routing of a Vehicle , 2015, GECCO.

[95]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[96]  Kalyanmoy Deb,et al.  Dynamic Multi-objective Optimization and Decision-Making Using Modified NSGA-II: A Case Study on Hydro-thermal Power Scheduling , 2007, EMO.

[97]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[98]  Sona Pawara,et al.  Remote monitoring of waters quality from reservoirs , 2017, 2017 2nd International Conference for Convergence in Technology (I2CT).

[99]  Graham Kendall,et al.  A learning-guided multi-objective evolutionary algorithm for constrained portfolio optimization , 2014, Appl. Soft Comput..

[100]  Pierre Baldi,et al.  Learning in the machine: Recirculation is random backpropagation , 2018, Neural Networks.

[101]  Nahum Shimkin,et al.  Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.

[102]  Tomasz Tajmajer Modular Multi-Objective Deep Reinforcement Learning with Decision Values , 2018, 2018 Federated Conference on Computer Science and Information Systems (FedCSIS).

[103]  Istvan Szita,et al.  Reinforcement Learning in Games , 2012, Reinforcement Learning.

[104]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[105]  Joseph A. Paradiso,et al.  The gesture recognition toolkit , 2014, J. Mach. Learn. Res..

[106]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[107]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[108]  Bin Li,et al.  Multi-strategy ensemble evolutionary algorithm for dynamic multi-objective optimization , 2010, Memetic Comput..

[109]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[110]  Markus Olhofer,et al.  Test Problems for Large-Scale Multiobjective and Many-Objective Optimization , 2017, IEEE Transactions on Cybernetics.

[111]  X. Y. Chen,et al.  A comparative study of population-based optimization algorithms for downstream river flow forecasting by a hybrid neural network model , 2015, Eng. Appl. Artif. Intell..

[112]  Taku Fujiyama,et al.  Combining machine learning with computational hydrodynamics for prediction of tidal surge inundation at estuarine ports , 2017 .

[113]  Turab Lookman,et al.  Multi-objective Optimization for Materials Discovery via Adaptive Design , 2018, Scientific Reports.

[114]  Kay Chen Tan,et al.  Solving the IEEE CEC 2015 Dynamic Benchmark Problems Using Kalman Filter Based Dynamic Multiobjective Evolutionary Algorithm , 2016 .

[115]  Alan C. Bovik,et al.  Surface Water Mapping by Deep Learning , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.