A Multi-step and Resilient Predictive Q-learning Algorithm for IoT with Human Operators in the Loop: A Case Study in Water Supply Networks

We consider the problem of recommending resilient and predictive actions for an IoT network in the presence of faulty components, considering the presence of human operators manipulating the information of the environment the agent sees for containment purposes. The IoT network is formulated as a directed graph with a known topology whose objective is to maintain a constant and resilient flow between a source and a destination node. The optimal route through this network is evaluated via a predictive and resilient Q-learning algorithm which takes into account historical data about irregular operation, due to faults, as well as the feedback from the human operators that are considered to have extra information about the status of the network concerning locations likely to be targeted by attacks. To showcase our method, we utilize anonymized data from Arlington County, Virginia, to compute predictive and resilient scheduling policies for a smart water supply system, while avoiding (i) all the locations indicated to be attacked according to human operators (ii) as many as possible neighborhoods detected to have leaks or other faults. This method incorporates both the adaptability of the human and the computation capability of the machine to achieve optimal implementation containment and recovery actions in water distribution.

[1]  Lu Yang,et al.  Human-in-the-loop reinforcement learning , 2017, 2017 Chinese Automation Congress (CAC).

[2]  Roger W. Remington,et al.  cognitive engineering: understanding human interaction with complex systems , 2005 .

[3]  Garrison W. Cottrell,et al.  Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[4]  Steffen Udluft,et al.  Safe exploration for reinforcement learning , 2008, ESANN.

[5]  Bryan W. Karney,et al.  The need for comprehensive transient analysis of distribution systems , 2007 .

[6]  Francisco Javier García-Polo,et al.  Safe reinforcement learning in high-risk tasks through policy improvement , 2011, ADPRL.

[7]  John Salvatier,et al.  Agent-Agnostic Human-in-the-Loop Reinforcement Learning , 2017, ArXiv.

[8]  Lida Xu,et al.  The internet of things: a survey , 2014, Information Systems Frontiers.

[9]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[10]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[11]  Kyriakos G. Vamvoudakis,et al.  A multi-step and resilient predictive Q-learning algorithm for IoT: a case study in water supply networks , 2018, IOT.

[12]  Saso Dzeroski,et al.  Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.

[13]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[14]  R. Clark Securing water and wastewater systems: global perspectives , 2014 .

[15]  Dit-Yan Yeung,et al.  Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control , 1995, NIPS.