Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber-Physical Systems

This paper studies how to schedule wireless transmissions from sensors to estimate the states of multiple remote, dynamic processes. Sensors make observations of each of the processes. Information from the different sensors have to be transmitted to a central gateway over a wireless network for monitoring purposes, where typically fewer wireless channels are available than there are processes to be monitored. Such estimation problems routinely occur in large-scale Cyber-Physical Systems, especially when the dynamic systems (processes) involved are geographically separated. For effective estimation at the gateway, the sensors need to be scheduled appropriately, i.e., at each time instant to decide which sensors have network access and which ones do not. To solve this scheduling problem, we formulate an associated Markov decision process (MDP). Further, we solve this MDP using a Deep Q-Network, a deep reinforcement learning algorithm that is at once scalable and model-free. We compare our scheduling algorithm to popular scheduling algorithms such as round-robin and reduced-waiting-time, among others. Our algorithm is shown to significantly outperform these algorithms for randomly generated example scenarios.

[1]  Emanuele Garone,et al.  On infinite-horizon sensor scheduling , 2014, Syst. Control. Lett..

[2]  Daniel E. Quevedo,et al.  Sensor Scheduling in Variance Based Event Triggered Estimation With Packet Drops , 2015, IEEE Transactions on Automatic Control.

[3]  Richard M. Murray,et al.  Optimal LQG control across packet-dropping links , 2007, Syst. Control. Lett..

[4]  Paulo Tabuada,et al.  Guest Editorial Special Issue on Control of Cyber-Physical Systems , 2014, IEEE Trans. Autom. Control..

[5]  Daniel E. Quevedo,et al.  DeepCAS: A Deep Reinforcement Learning Algorithm for Control-Aware Scheduling , 2018, IEEE Control Systems Letters.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Upamanyu Madhow,et al.  Fair scheduling with tunable latency: a round-robin approach , 2003, TNET.

[8]  Ling Shi,et al.  Kalman Filtering Over a Packet-Dropping Network: A Probabilistic Perspective , 2010, IEEE Transactions on Automatic Control.

[9]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[11]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[12]  W. Fleming Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .

[13]  R. Srikant,et al.  Scheduling Efficiency of Distributed Greedy Scheduling Algorithms in Wireless Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[14]  João Pedro Hespanha,et al.  Redundant data transmission in control/estimation over lossy networks , 2012, Autom..

[15]  Fatos Xhafa,et al.  Special issue on cyber physical systems , 2013, Computing.

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Andreas F. Molisch,et al.  Wireless Communications , 2005 .

[18]  Richard M. Murray,et al.  On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage , 2006, Autom..

[19]  Vivek S. Borkar,et al.  Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..

[20]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[21]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[22]  Insup Lee,et al.  Cyber-physical systems: The next computing revolution , 2010, Design Automation Conference.

[23]  John G. Proakis,et al.  Digital Communications , 1983 .

[24]  J.P. Hespanha,et al.  Estimation under uncontrolled and controlled communications in Networked Control Systems , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[25]  P. Sadeghi,et al.  Finite-state Markov modeling of fading channels - a survey of principles and applications , 2008, IEEE Signal Processing Magazine.

[26]  Ness B. Shroff,et al.  On the Complexity of Scheduling in Wireless Networks , 2010, EURASIP J. Wirel. Commun. Netw..

[27]  Alejandro Ribeiro,et al.  Learning in Non-Stationary Wireless Control Systems via Newton's Method , 2018, 2018 Annual American Control Conference (ACC).

[28]  John S. Baras,et al.  Sensor Scheduling using Smart Sensors , 2007, 2007 46th IEEE Conference on Decision and Control.

[29]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[30]  Sebastian Trimpe,et al.  Deep Reinforcement Learning for Event-Triggered Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[31]  Ling Shi,et al.  Optimal sensor scheduling for multiple linear dynamical systems , 2017, Autom..

[32]  Shuang Wu,et al.  Optimal scheduling of multiple sensors over shared channels with packet transmission constraint , 2018, Autom..

[33]  David Tse,et al.  Fundamentals of Wireless Communication , 2005 .

[34]  Special Issue on Control of Cyber-Physical Systems , 2014 .

[35]  Cyril Leung,et al.  Proportional Fair Multiuser Scheduling in LTE , 2009, IEEE Signal Processing Letters.

[36]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[37]  Subhrakanti Dey,et al.  Stability of Kalman filtering with Markovian packet losses , 2007, Autom..

[38]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[39]  Daniel E. Quevedo,et al.  Power Control and Coding Formulation for State Estimation With Wireless Sensors , 2013, IEEE Transactions on Control Systems Technology.

[40]  Claire J. Tomlin,et al.  On the optimal solutions of the infinite-horizon linear sensor scheduling problem , 2010, 49th IEEE Conference on Decision and Control (CDC).