Online Safety Assurance for Learning-Augmented Systems

Recently, deep learning has been successfully applied to a variety of networking problems. A fundamental challenge is that when the operational environment for a learning-augmented system differs from its training environment, such systems often make badly informed decisions, leading to bad performance. We argue that safely deploying learning-driven systems requires being able to determine, in real-time, whether system behavior is coherent, for the purpose of defaulting to a reasonable heuristic when this is not so. We term this the online safety assurance problem (OSAP). We present three approaches to quantifying decision uncertainty that differ in terms of the signal used to infer uncertainty. We illustrate the usefulness of online safety assurance in the context of the proposed deep reinforcement learning (RL) approach to video streaming. While deep RL for video streaming bests other approaches when the operational and training environments match, it is dominated by simple heuristics when the two differ. Our preliminary findings suggest that transitioning to a default policy when decision uncertainty is detected is key to enjoying the performance benefits afforded by leveraging ML without compromising on safety.

[1]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  B. Pasik-Duncan,et al.  Adaptive Control , 1996, IEEE Control Systems.

[4]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[5]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[8]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[9]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[10]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[11]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[12]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[13]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[14]  Carsten Griwodz,et al.  Commute path bandwidth traces from 3G networks: analysis and applications , 2013, MMSys.

[15]  Michael B. Miller Linear Regression Analysis , 2013 .

[16]  Nick McKeown,et al.  A buffer-based approach to rate adaptation , 2014, SIGCOMM.

[17]  Sungzoon Cho,et al.  Variational Autoencoder based Anomaly Detection using Reconstruction Probability , 2015 .

[18]  Philip S. Thomas,et al.  High Confidence Policy Improvement , 2015, ICML.

[19]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[20]  Hari Balakrishnan,et al.  Mahimahi: Accurate Record-and-Replay for HTTP , 2015, USENIX Annual Technical Conference.

[21]  Bruno Sinopoli,et al.  A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP , 2015, Comput. Commun. Rev..

[22]  Philip S. Thomas,et al.  High-Confidence Off-Policy Evaluation , 2015, AAAI.

[23]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[24]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[25]  Filip De Turck,et al.  HTTP/2-Based Adaptive Streaming of HEVC Video Over 4G/LTE Networks , 2016, IEEE Communications Letters.

[26]  Yi Sun,et al.  CS2P: Improving Video Bitrate Selection and Adaptation with Data-Driven Throughput Prediction , 2016, SIGCOMM.

[27]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[28]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[29]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[30]  Marek Petrik,et al.  Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.

[31]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[32]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[33]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Dafna Shahaf,et al.  Learning to Route , 2017, HotNets.

[35]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[36]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[37]  Georg Langs,et al.  Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery , 2017, IPMI.

[38]  Pieter Abbeel,et al.  Safer Classification by Synthesis , 2017, ArXiv.

[39]  Peter Stone,et al.  Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation , 2016, AAAI.

[40]  AuTO , 2018, Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication.

[41]  Philip Levis,et al.  Pantheon: the training ground for Internet congestion-control research , 2018, USENIX Annual Technical Conference.

[42]  Girish Chowdhary,et al.  Robust Deep Reinforcement Learning with Adversarial Attacks , 2017, AAMAS.

[43]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[44]  Feng Liu,et al.  AuTO: scaling deep reinforcement learning for datacenter-scale automatic traffic optimization , 2018, SIGCOMM.

[45]  Bruno Ribeiro,et al.  Oboe: auto-tuning video ABR algorithms to network conditions , 2018, SIGCOMM.

[46]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[47]  Sarit Kraus,et al.  Safe Policy Learning from Observations , 2018, ArXiv.

[48]  Zhijian Ou,et al.  Learning Neural Random Fields with Inclusive Auxiliary Generators , 2018, ArXiv.

[49]  Albin Cassirer,et al.  Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.

[50]  Xin Jin,et al.  Neural packet classification , 2019, SIGCOMM.

[51]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[52]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[53]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[54]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[55]  Sanjay Chawla,et al.  Deep Learning for Anomaly Detection: A Survey , 2019, ArXiv.

[56]  Brighten Godfrey,et al.  A Deep Reinforcement Learning Perspective on Internet Congestion Control , 2019, ICML.

[57]  Hongzi Mao,et al.  Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[58]  Dan Pei,et al.  Dynamic TCP Initial Windows and Congestion Control Schemes Through Reinforcement Learning , 2019, IEEE Journal on Selected Areas in Communications.

[59]  Ramesh K. Sitaraman,et al.  RL-Cache: Learning-Based Cache Admission for Content Delivery , 2019, IEEE Journal on Selected Areas in Communications.

[60]  Julian Togelius,et al.  Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning , 2019, IJCAI.

[61]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[62]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[63]  Ramesh K. Sitaraman,et al.  RL-Cache: Learning-Based Cache Admission for Content Delivery , 2020, IEEE Journal on Selected Areas in Communications.

[64]  Philip Levis,et al.  Learning in situ: a randomized experiment in video streaming , 2019, NSDI.

[65]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[66]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.