Real-world Video Adaptation with Reinforcement Learning

Client-side video players employ adaptive bitrate (ABR) algorithms to optimize user quality of experience (QoE).We evaluate recently proposed RL-based ABR methods in Facebook’s web-based video streaming platform. Real-world ABR contains several challenges that requires customized designs beyond off-the-shelf RL algorithms — we implement a scalable neural network architecture that supports videos with arbitrary bitrate encodings; we design a training method to cope with the variance resulting from the stochasticity in network conditions; and we leverage constrained Bayesian optimization for reward shaping in order to optimize the conflicting QoE objectives. In a week-long worldwide deployment with more than 30 million video streaming sessions, our RL approach outperforms the existing human-engineered ABR algorithms.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Ali C. Begen,et al.  An experimental evaluation of rate-adaptation algorithms in adaptive streaming over HTTP , 2011, MMSys.

[3]  Hongzi Mao,et al.  Variance Reduction for Reinforcement Learning in Input-Driven Environments , 2018, ICLR.

[4]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[5]  Iraj Sodagar,et al.  The MPEG-DASH Standard for Multimedia Streaming Over the Internet , 2011, IEEE MultiMedia.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Filip De Turck,et al.  A learning-based algorithm for improved bandwidth-awareness of adaptive streaming clients , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[8]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[10]  Eytan Bakshy,et al.  Bayesian Optimization for Policy Search via Online-Offline Experimentation , 2019, J. Mach. Learn. Res..

[11]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[12]  Cisco Visual Networking Index: Forecast and Methodology 2016-2021.(2017) http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/complete-white-paper-c11-481360.html. High Efficiency Video Coding (HEVC) Algorithms and Architectures https://jvet.hhi.fraunhofer. , 2017 .

[13]  Ramesh K. Sitaraman,et al.  Video Stream Quality Impacts Viewer Behavior: Inferring Causality Using Quasi-Experimental Designs , 2012, IEEE/ACM Transactions on Networking.

[14]  Pieter Abbeel,et al.  Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.

[15]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[16]  Christian Timmerer,et al.  Dynamic adaptive streaming over HTTP dataset , 2012, MMSys '12.

[17]  George Zyskind,et al.  On Best Linear Estimation and General Gauss-Markov Theorem in Linear Models with Arbitrary Nonnegative Covariance Structure , 1969 .

[18]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[19]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[20]  Te-Yuan Huang,et al.  A buffer-based approach to rate adaptation: evidence from a large video streaming service , 2015, SIGCOMM 2015.

[21]  Yuandong Tian,et al.  ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[22]  Zhi-Li Zhang,et al.  Vivisecting YouTube: An active measurement study , 2012, 2012 Proceedings IEEE INFOCOM.

[23]  Ramesh K. Sitaraman,et al.  BOLA: Near-Optimal Bitrate Adaptation for Online Videos , 2016, IEEE/ACM Transactions on Networking.

[24]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[25]  Bruno Sinopoli,et al.  A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP , 2015, Comput. Commun. Rev..

[26]  Michael Fairbank,et al.  The divergence of reinforcement learning algorithms with value-iteration and function approximation , 2011, The 2012 International Joint Conference on Neural Networks (IJCNN).

[27]  Lex Weaver,et al.  The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[28]  Craig Boutilier,et al.  Non-delusional Q-learning and value-iteration , 2018, NeurIPS.

[29]  Filip De Turck,et al.  Design of a Q-learning-based client quality selection algorithm for HTTP adaptive video streaming , 2013, ALA 2013.

[30]  Vyas Sekar,et al.  Understanding the impact of video quality on user engagement , 2011, SIGCOMM.

[31]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[32]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[33]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[34]  Guilherme Ottoni,et al.  Constrained Bayesian Optimization with Noisy Experiments , 2017, Bayesian Analysis.