Learning To Route with Deep RL

We investigate a novel and important application domain for deep RL: network routing. The question of whether/when traditional network protocol design, which relies on the application of algorithmic insights by human experts, can be replaced by a data-driven approach has received much attention recently. We explore this question in the context of the, arguably, most fundamental networking task: routing. Can ideas and techniques from machine learning be leveraged to automatically generate “good” routing configurations? We observe that the routing domain poses significant challenges for data-driven network protocol design and report on preliminary results regarding the power of data-driven routing. Our results suggest that applying deep reinforcement learning to this context yields high performance and is thus a promising direction for further research. We outline a research agenda for data-driven routing.

[1]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[2]  Jürgen Teich,et al.  Packet routing in dynamically changing networks on chip , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[3]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Nikhil R. Devanur,et al.  ProjecToR: Agile Reconfigurable Data Center Interconnect , 2016, SIGCOMM.

[6]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[7]  Matthew Roughan,et al.  The Internet Topology Zoo , 2011, IEEE Journal on Selected Areas in Communications.

[8]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[9]  Farhad Shahrokhi,et al.  The maximum concurrent flow problem , 1990, JACM.

[10]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[11]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[12]  Marco Chiesa,et al.  Lying Your Way to Better Traffic Engineering , 2016, CoNEXT.

[13]  Christophe Diot,et al.  Traffic matrix estimation: existing techniques and new directions , 2002, SIGCOMM 2002.

[14]  Wojciech Czarnecki,et al.  On Loss Functions for Deep Neural Networks in Classification , 2017, ArXiv.

[15]  T. Chow,et al.  Nonlinear autoregressive integrated neural network model for short-term load forecasting , 1996 .

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[18]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[21]  Mikkel Thorup,et al.  Traffic engineering with traditional IP routing protocols , 2002, IEEE Commun. Mag..

[22]  Edith Cohen,et al.  Optimal oblivious routing in polynomial time , 2004, J. Comput. Syst. Sci..

[23]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[24]  Mo Dong,et al.  PCC: Re-architecting Congestion Control for Consistent High Performance , 2014, NSDI.

[25]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[26]  Paramvir Bahl,et al.  Augmenting data center networks with multi-gigabit wireless links , 2011, SIGCOMM 2011.

[27]  Mikkel Thorup,et al.  Increasing Internet Capacity Using Local Search , 2004, Comput. Optim. Appl..

[28]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[29]  Srikanth Kandula,et al.  Walking the tightrope: responsive yet stable traffic engineering , 2005, SIGCOMM '05.

[30]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[31]  Hao Wang,et al.  Lube: Mitigating Bottlenecks in Wide Area Data Analytics , 2017, HotCloud.

[32]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[33]  Albert G. Greenberg,et al.  Experience in measuring backbone traffic variability: models, metrics, measurements and meaning , 2002, IMW '02.

[34]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[35]  Himanshu Shah,et al.  FireFly , 2014, SIGCOMM.

[36]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[37]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[38]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[39]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[40]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[41]  Mikkel Thorup,et al.  Optimizing OSPF/IS-IS weights in a changing world , 2002, IEEE J. Sel. Areas Commun..

[42]  Ao Tang,et al.  HALO: Hop-by-Hop Adaptive Link-State Optimal Routing , 2015, IEEE/ACM Transactions on Networking.

[43]  Robert Soulé,et al.  Kulfi: Robust Traffic Engineering Using Semi-Oblivious Routing , 2016, ArXiv.

[44]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[45]  Mung Chiang,et al.  Link-State Routing With Hop-by-Hop Forwarding Can Achieve Optimal Traffic Engineering , 2011, IEEE/ACM Trans. Netw..

[46]  Ben Y. Zhao,et al.  Mirror mirror on the ceiling: flexible wireless links for data centers , 2012, SIGCOMM.

[47]  Yuval Shavitt,et al.  Maximum Flow Routing with Weighted Max-Min Fairness , 2004, QofIS.