论文信息 - Real-world Video Adaptation with Reinforcement Learning

Real-world Video Adaptation with Reinforcement Learning

Client-side video players employ adaptive bitrate (ABR) algorithms to optimize user quality of experience (QoE).We evaluate recently proposed RL-based ABR methods in Facebook’s web-based video streaming platform. Real-world ABR contains several challenges that requires customized designs beyond off-the-shelf RL algorithms — we implement a scalable neural network architecture that supports videos with arbitrary bitrate encodings; we design a training method to cope with the variance resulting from the stochasticity in network conditions; and we leverage constrained Bayesian optimization for reward shaping in order to optimize the conflicting QoE objectives. In a week-long worldwide deployment with more than 30 million video streaming sessions, our RL approach outperforms the existing human-engineered ABR algorithms.

[1] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[2] Ali C. Begen,et al. An experimental evaluation of rate-adaptation algorithms in adaptive streaming over HTTP , 2011, MMSys.

[3] Hongzi Mao,et al. Variance Reduction for Reinforcement Learning in Input-Driven Environments , 2018, ICLR.

[4] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[5] Iraj Sodagar,et al. The MPEG-DASH Standard for Multimedia Streaming Over the Internet , 2011, IEEE MultiMedia.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Filip De Turck,et al. A learning-based algorithm for improved bandwidth-awareness of adaptive streaming clients , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[8] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.

[10] Eytan Bakshy,et al. Bayesian Optimization for Policy Search via Online-Offline Experimentation , 2019, J. Mach. Learn. Res..

[11] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[12] Cisco Visual Networking Index: Forecast and Methodology 2016-2021.(2017) http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/complete-white-paper-c11-481360.html. High Efficiency Video Coding (HEVC) Algorithms and Architectures https://jvet.hhi.fraunhofer. , 2017 .

[13] Ramesh K. Sitaraman,et al. Video Stream Quality Impacts Viewer Behavior: Inferring Causality Using Quasi-Experimental Designs , 2012, IEEE/ACM Transactions on Networking.

[14] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.

[15] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[16] Christian Timmerer,et al. Dynamic adaptive streaming over HTTP dataset , 2012, MMSys '12.

[17] George Zyskind,et al. On Best Linear Estimation and General Gauss-Markov Theorem in Linear Models with Arbitrary Nonnegative Covariance Structure , 1969 .

[18] Peter I. Frazier,et al. A Tutorial on Bayesian Optimization , 2018, ArXiv.

[19] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[20] Te-Yuan Huang,et al. A buffer-based approach to rate adaptation: evidence from a large video streaming service , 2015, SIGCOMM 2015.

[21] Yuandong Tian,et al. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[22] Zhi-Li Zhang,et al. Vivisecting YouTube: An active measurement study , 2012, 2012 Proceedings IEEE INFOCOM.

[23] Ramesh K. Sitaraman,et al. BOLA: Near-Optimal Bitrate Adaptation for Online Videos , 2016, IEEE/ACM Transactions on Networking.

[24] Hongzi Mao,et al. Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[25] Bruno Sinopoli,et al. A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP , 2015, Comput. Commun. Rev..

[26] Michael Fairbank,et al. The divergence of reinforcement learning algorithms with value-iteration and function approximation , 2011, The 2012 International Joint Conference on Neural Networks (IJCNN).

[27] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[28] Craig Boutilier,et al. Non-delusional Q-learning and value-iteration , 2018, NeurIPS.

[29] Filip De Turck,et al. Design of a Q-learning-based client quality selection algorithm for HTTP adaptive video streaming , 2013, ALA 2013.

[30] Vyas Sekar,et al. Understanding the impact of video quality on user engagement , 2011, SIGCOMM.

[31] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[32] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[33] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[34] Guilherme Ottoni,et al. Constrained Bayesian Optimization with Noisy Experiments , 2017, Bayesian Analysis.