Deep reinforced bitrate ladders for adaptive video streaming

In the typical transcoding pipeline for adaptive video streaming, raw videos are pre-chunked and pre-encoded according to a set of resolution-bitrate or resolution-quality pairs on the server-side, where the pair is often named as bitrate ladder. Different from existing heuristics, we argue that a good bitrate ladder should be optimized by considering video content features, network capacity, and storage costs on the cloud. We propose DeepLadder, a per-chunk optimization scheme which adopts state-of-the-art deep reinforcement learning (DRL) method to optimize the bitrate ladder w.r.t the above concerns. Technically, DeepLadder selects the proper setting for each video resolution autoregressively. We use over 8,000 video chunks, measure over 1,000,000 perceptual video qualities, collect real-world network traces for more than 50 hours, and invent faithful virtual environments to help train DeepLadder efficiently. Across a series of comprehensive experiments on both Constant Bitrate (CBR) and Variable Bitrate (VBR)-encoded videos, we demonstrate significant improvements in average video quality bandwidth utilization, and storage overhead in comparison to prior work as well as the ability to be deployed in the real-world transcoding framework.

[1]  Ramesh K. Sitaraman,et al.  From theory to practice: improving bitrate adaptation in the DASH reference player , 2018, MMSys.

[2]  Christian Timmerer,et al.  A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP , 2019, IEEE Communications Surveys & Tutorials.

[3]  Reza Rassool,et al.  VMAF reproducibility: Validating a perceptual practical video quality metric , 2017, 2017 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB).

[4]  Yuriy A. Reznik,et al.  Optimal Design of Encoding Profiles for ABR Streaming , 2018, PV@MMSys.

[5]  Zhichao Zhou,et al.  Predicting Rate Control Target Through A Learning Based Content Adaptive Model , 2019, 2019 Picture Coding Symposium (PCS).

[6]  Silvia Rossi,et al.  Do Users Behave Similarly in VR? Investigation of the User Influence on the System Design , 2020, ACM Trans. Multim. Comput. Commun. Appl..

[7]  Anne Aaron,et al.  A large-scale video codec comparison of x264, x265 and libvpx for practical VOD applications , 2016, Optical Engineering + Applications.

[8]  Ali C. Begen,et al.  A Distributed Approach for Bitrate Selection in HTTP Adaptive Streaming , 2018, ACM Multimedia.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[11]  Yuriy A. Reznik,et al.  Optimal Multi-Codec Adaptive Bitrate Streaming , 2019, 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[12]  Cormac J. Sreenan,et al.  Multi-profile ultra high definition (UHD) AVC and HEVC 4K DASH datasets , 2018, MMSys.

[13]  Christian Timmerer,et al.  Multi-codec DASH dataset , 2018, MMSys.

[14]  Anil C. Kokaram,et al.  Optimized Transcoding for Large Scale Adaptive Streaming Using Playback Statistics , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Mohammed Ghanbari,et al.  Optimal transcoding of compressed video , 1997, Proceedings of International Conference on Image Processing.

[18]  Alan C. Bovik,et al.  A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms , 2006, IEEE Transactions on Image Processing.

[19]  Yuan Tang,et al.  TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning , 2016, ArXiv.

[20]  Xi Zheng,et al.  Crowdsourcing Mechanism for Trust Evaluation in CPCS Based on Intelligent Mobile Edge Computing , 2019, ACM Trans. Intell. Syst. Technol..

[21]  Yonggang Wen,et al.  Morph: A Fast and Scalable Cloud Transcoding System , 2016, ACM Multimedia.

[22]  Yanling Xu,et al.  Rate-Distortion Cost Estimation Model Based on Cauchy Distributions for HEVC Encoder , 2020, 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC).

[23]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[24]  Bruno Sinopoli,et al.  A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP , 2015, Comput. Commun. Rev..

[25]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[26]  Satinder Singh,et al.  Many-Goals Reinforcement Learning , 2018, ArXiv.

[27]  E. Lehrer,et al.  Relative entropy in sequential decision problems , 2000 .

[28]  Jan De Cock,et al.  Complexity-based consistent-quality encoding in the cloud , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[29]  Angeliki V. Katsenou,et al.  Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming , 2019, 2019 Picture Coding Symposium (PCS).

[30]  Federico Chiariotti,et al.  D-DASH: A Deep Q-Learning Framework for DASH Video Streaming , 2017, IEEE Transactions on Cognitive Communications and Networking.

[31]  Krishna R. Pattipati,et al.  ABR streaming of VBR-encoded videos: characterization, challenges, and solutions , 2018, CoNEXT.

[32]  Kemal Ugur,et al.  Intra Coding of the HEVC Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Chunyan Miao,et al.  A Survey of Zero-Shot Learning , 2019, ACM Trans. Intell. Syst. Technol..

[34]  Hao Wu,et al.  Mastering Complex Control in MOBA Games with Deep Reinforcement Learning , 2019, AAAI.

[35]  Hui Zhang,et al.  A Method for Evaluating QoE of Live Streaming Services , 2015 .

[36]  Carsten Griwodz,et al.  Commute path bandwidth traces from 3G networks: analysis and applications , 2013, MMSys.

[37]  Djemel Ziou,et al.  Image Quality Metrics: PSNR vs. SSIM , 2010, 2010 20th International Conference on Pattern Recognition.

[38]  Xiangbo Li,et al.  Optimizing Mass-Scale Multi-Screen Video Delivery , 2020 .

[39]  Mickaël Raulet,et al.  Ultra high definition HEVC DASH data set , 2014, MMSys '14.

[40]  Bruno Ribeiro,et al.  Oboe: auto-tuning video ABR algorithms to network conditions , 2018, SIGCOMM.

[41]  Hari Balakrishnan,et al.  Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks , 2013, NSDI.

[42]  Cisco Visual Networking Index: Forecast and Methodology 2016-2021.(2017) http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/complete-white-paper-c11-481360.html. High Efficiency Video Coding (HEVC) Algorithms and Architectures https://jvet.hhi.fraunhofer. , 2017 .