OnRL: improving mobile video telephony via online reinforcement learning

Machine learning models, particularly reinforcement learning (RL), have demonstrated great potential in optimizing video streaming applications. However, the state-of-the-art solutions are limited to an "offline learning" paradigm, i.e., the RL models are trained in simulators and then are operated in real networks. As a result, they inevitably suffer from the simulation-to-reality gap, showing far less satisfactory performance under real conditions compared with simulated environment. In this work, we close the gap by proposing OnRL, an online RL framework for real-time mobile video telephony. OnRL puts many individual RL agents directly into the video telephony system, which make video bitrate decisions in real-time and evolve their models over time. OnRL then aggregates these agents to form a high-level RL model that can help each individual to react to unseen network conditions. Moreover, OnRL incorporates novel mechanisms to handle the adverse impacts of inherent video traffic dynamics, and to eliminate risks of quality degradation caused by the RL model's exploration attempts. We implement OnRL on a mainstream operational video telephony system, Alibaba Taobao-live. In a month-long evaluation with 543 hours of video sessions from 151 real-world mobile users, OnRL outperforms the prior algorithms significantly, reducing video stalling rate by 14.22% while maintaining similar video quality.

[1]  Qiang Xu,et al.  PROTEUS: network performance forecast for real-time, interactive mobile applications , 2013, MobiSys '13.

[2]  Philip Levis,et al.  Learning in situ: a randomized experiment in video streaming , 2019, NSDI.

[3]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[4]  Injong Rhee,et al.  CUBIC: a new TCP-friendly high-speed TCP variant , 2008, OPSR.

[5]  Keith Winstein,et al.  Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol , 2018, NSDI.

[6]  Mo Dong,et al.  PCC Vivace: Online-Learning Congestion Control , 2018, NSDI.

[7]  Xiufeng Xie,et al.  Learning to Coordinate Video Codec with Transport Protocol for Mobile Video Telephony , 2019, MobiCom.

[8]  Ziheng Wang,et al.  Toward Intuitive Teleoperation in Surgery: Human-Centric Evaluation of Teleoperation Algorithms for Robotic Needle Steering , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Brighten Godfrey,et al.  A Deep Reinforcement Learning Perspective on Internet Congestion Control , 2019, ICML.

[10]  Oriol Vinyals,et al.  Qualitatively characterizing neural network optimization problems , 2014, ICLR.

[11]  Van Jacobson,et al.  BBR: Congestion-Based Congestion Control , 2016, ACM Queue.

[12]  Lifeng Sun,et al.  Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs , 2019, ArXiv.

[13]  Marie Hopkins Live Sports Virtual Reality Broadcasts: Copyright and Other Protections , 2018 .

[14]  Tim Kraska,et al.  Park: An Open Platform for Learning-Augmented Computer Systems , 2019, NeurIPS.

[15]  Vyas Sekar,et al.  Improving fairness, efficiency, and stability in HTTP-based adaptive video streaming with FESTIVE , 2012, CoNEXT '12.

[16]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[17]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[19]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[22]  Steven C. H. Hoi,et al.  Online Deep Learning: Learning Deep Neural Networks on the Fly , 2017, IJCAI.

[23]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[24]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[25]  Xinyu Zhang,et al.  Accelerating Mobile Web Loading Using Cellular Link Information , 2017, MobiSys.

[26]  Lea Skorin-Kapov,et al.  Game Categorization for Deriving QoE-Driven Video Encoding Configuration Strategies for Cloud Gaming , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[27]  Mohammed Ghanbari,et al.  The accuracy of PSNR in predicting video quality for different video scenes and frame rates , 2012, Telecommun. Syst..

[28]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[29]  Tong Li,et al.  Toward Cloud-Based Distributed Interactive Applications: Measurement, Modeling, and Analysis , 2018, IEEE/ACM Transactions on Networking.

[30]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[31]  Lakshminarayanan Subramanian,et al.  Adaptive Congestion Control for Unpredictable Cellular Networks , 2015, Comput. Commun. Rev..

[32]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[33]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[34]  Swarun Kumar,et al.  piStream: Physical Layer Informed Adaptive Video Streaming over LTE , 2015, MobiCom.

[35]  Philip Levis,et al.  Pantheon: the training ground for Internet congestion-control research , 2018, USENIX Annual Technical Conference.

[36]  Saverio Mascolo,et al.  Congestion Control for Web Real-Time Communication , 2017, IEEE/ACM Transactions on Networking.

[37]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[38]  Kanter,et al.  Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.

[39]  C BegenAli,et al.  An experimental evaluation of rate-adaptive video players over HTTP , 2012 .

[40]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[41]  Ali C. Begen,et al.  An experimental evaluation of rate-adaptive video players over HTTP , 2012, Signal Process. Image Commun..

[42]  Dileep M. Kalathil,et al.  QFlow: A Reinforcement Learning Approach to High QoE Video Streaming over Wireless Networks , 2019, MobiHoc.

[43]  Yueqiu Jiang,et al.  Improvement of TCP Reno Congestion Control Protocol , 2014 .

[44]  DuchiJohn,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011 .

[45]  Hongzi Mao,et al.  Towards Safe Online Reinforcement Learning in Computer Systems , 2019 .

[46]  Luca Bascetta,et al.  Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.

[47]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[48]  Vyas Sekar,et al.  Via: Improving Internet Telephony Call Quality Using Predictive Relay Selection , 2016, SIGCOMM.

[49]  Hari Balakrishnan,et al.  Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks , 2013, NSDI.

[50]  Yuandong Tian,et al.  Real-world Video Adaptation with Reinforcement Learning , 2019, ArXiv.

[51]  Li Li,et al.  Close the Gap between Deep Learning and Mobile Intelligence by Incorporating Training in the Loop , 2019, ACM Multimedia.

[52]  Mo Dong,et al.  PCC: Re-architecting Congestion Control for Consistent High Performance , 2014, NSDI.

[53]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.