论文信息 - Learning To Route with Deep RL

Learning To Route with Deep RL

We investigate a novel and important application domain for deep RL: network routing. The question of whether/when traditional network protocol design, which relies on the application of algorithmic insights by human experts, can be replaced by a data-driven approach has received much attention recently. We explore this question in the context of the, arguably, most fundamental networking task: routing. Can ideas and techniques from machine learning be leveraged to automatically generate “good” routing configurations? We observe that the routing domain poses significant challenges for data-driven network protocol design and report on preliminary results regarding the power of data-driven routing. Our results suggest that applying deep reinforcement learning to this context yields high performance and is thus a promising direction for further research. We outline a research agenda for data-driven routing.

[1] Hari Balakrishnan,et al. TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[2] Jürgen Teich,et al. Packet routing in dynamically changing networks on chip , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[3] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] Nikhil R. Devanur,et al. ProjecToR: Agile Reconfigurable Data Center Interconnect , 2016, SIGCOMM.

[6] Amin Vahdat,et al. Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[7] Matthew Roughan,et al. The Internet Topology Zoo , 2011, IEEE Journal on Selected Areas in Communications.

[8] David A. Maltz,et al. Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[9] Farhad Shahrokhi,et al. The maximum concurrent flow problem , 1990, JACM.

[10] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[11] Anne-Marie Kermarrec,et al. The many faces of publish/subscribe , 2003, CSUR.

[12] Marco Chiesa,et al. Lying Your Way to Better Traffic Engineering , 2016, CoNEXT.

[13] Christophe Diot,et al. Traffic matrix estimation: existing techniques and new directions , 2002, SIGCOMM 2002.

[14] Wojciech Czarnecki,et al. On Loss Functions for Deep Neural Networks in Classification , 2017, ArXiv.

[15] T. Chow,et al. Nonlinear autoregressive integrated neural network model for short-term load forecasting , 1996 .

[16] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[18] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[19] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[21] Mikkel Thorup,et al. Traffic engineering with traditional IP routing protocols , 2002, IEEE Commun. Mag..

[22] Edith Cohen,et al. Optimal oblivious routing in polynomial time , 2004, J. Comput. Syst. Sci..

[23] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[24] Mo Dong,et al. PCC: Re-architecting Congestion Control for Consistent High Performance , 2014, NSDI.

[25] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[26] Paramvir Bahl,et al. Augmenting data center networks with multi-gigabit wireless links , 2011, SIGCOMM 2011.

[27] Mikkel Thorup,et al. Increasing Internet Capacity Using Local Search , 2004, Comput. Optim. Appl..

[28] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[29] Srikanth Kandula,et al. Walking the tightrope: responsive yet stable traffic engineering , 2005, SIGCOMM '05.

[30] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[31] Hao Wang,et al. Lube: Mitigating Bottlenecks in Wide Area Data Analytics , 2017, HotCloud.

[32] Hari Balakrishnan,et al. Resilient overlay networks , 2001, SOSP.

[33] Albert G. Greenberg,et al. Experience in measuring backbone traffic variability: models, metrics, measurements and meaning , 2002, IMW '02.

[34] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[35] Himanshu Shah,et al. FireFly , 2014, SIGCOMM.

[36] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[37] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[38] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[39] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[40] Srikanth Kandula,et al. Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[41] Mikkel Thorup,et al. Optimizing OSPF/IS-IS weights in a changing world , 2002, IEEE J. Sel. Areas Commun..

[42] Ao Tang,et al. HALO: Hop-by-Hop Adaptive Link-State Optimal Routing , 2015, IEEE/ACM Transactions on Networking.

[43] Robert Soulé,et al. Kulfi: Robust Traffic Engineering Using Semi-Oblivious Routing , 2016, ArXiv.

[44] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[45] Mung Chiang,et al. Link-State Routing With Hop-by-Hop Forwarding Can Achieve Optimal Traffic Engineering , 2011, IEEE/ACM Trans. Netw..

[46] Ben Y. Zhao,et al. Mirror mirror on the ceiling: flexible wireless links for data centers , 2012, SIGCOMM.

[47] Yuval Shavitt,et al. Maximum Flow Routing with Weighted Max-Min Fairness , 2004, QofIS.