Tiyuntsong: A Self-Play Reinforcement Learning Approach for ABR Video Streaming

Existing reinforcement learning (RL)-based adaptive bitrate (ABR) approaches outperform the previous fixed control rules based methods by improving the Quality of Experience (QoE) score, as the QoE metric can hardly provide clear guidance for optimization, finally resulting in the unexpected strategies. In this paper, we propose Tiyuntsong, a self-play reinforcement learning approach with generative adversarial network (GAN)-based method for ABR video streaming. Tiyuntsong learns strategies automatically by training two agents who are competing against each other. Note that the competition results are determined by a set of rules rather than a numerical QoE score that allows clearer optimization objectives. Meanwhile, we propose GAN Enhancement Module to extract hidden features from the past status for preserving the information without the limitations of sequence lengths. Using testbed experiments, we show that the utilization of GAN significantly improves the Tiyuntsong's performance. By comparing the performance of ABRs, we observe that Tiyuntsong also betters existing ABR algorithms in the underlying metrics.

[1]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[2]  Ramesh K. Sitaraman,et al.  From theory to practice: improving bitrate adaptation in the DASH reference player , 2018, MMSys.

[3]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[5]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[6]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[7]  Te-Yuan Huang,et al.  A buffer-based approach to rate adaptation: evidence from a large video streaming service , 2015, SIGCOMM 2015.

[8]  Yi Sun,et al.  CS2P: Improving Video Bitrate Selection and Adaptation with Data-Driven Throughput Prediction , 2016, SIGCOMM.

[9]  Bruno Ribeiro,et al.  Oboe: auto-tuning video ABR algorithms to network conditions , 2018, SIGCOMM.

[10]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[11]  Federico Chiariotti,et al.  D-DASH: A Deep Q-Learning Framework for DASH Video Streaming , 2017, IEEE Transactions on Cognitive Communications and Networking.

[12]  A. Elo The rating of chessplayers, past and present , 1978 .

[13]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Bruno Sinopoli,et al.  A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP , 2015, Comput. Commun. Rev..

[15]  Federico Chiariotti,et al.  Online learning adaptation strategy for DASH clients , 2016, MMSys.

[16]  Cisco Visual Networking Index: Forecast and Methodology 2016-2021.(2017) http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/complete-white-paper-c11-481360.html. High Efficiency Video Coding (HEVC) Algorithms and Architectures https://jvet.hhi.fraunhofer. , 2017 .

[17]  Ramesh K. Sitaraman,et al.  BOLA: Near-Optimal Bitrate Adaptation for Online Videos , 2016, IEEE/ACM Transactions on Networking.

[18]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[19]  Ali C. Begen,et al.  Probe and Adapt: Rate Adaptation for HTTP Video Streaming At Scale , 2013, IEEE Journal on Selected Areas in Communications.

[20]  Christian Timmerer,et al.  A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP , 2019, IEEE Communications Surveys & Tutorials.

[21]  Clare Lyle,et al.  GAN Q-learning , 2018, ArXiv.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Bruno Sinopoli,et al.  A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP , 2015, Comput. Commun. Rev..

[24]  Filip De Turck,et al.  HTTP/2-Based Adaptive Streaming of HEVC Video Over 4G/LTE Networks , 2016, IEEE Communications Letters.

[25]  Filip De Turck,et al.  Design and optimisation of a (FA)Q-learning-based HTTP adaptive streaming client , 2014, Connect. Sci..

[26]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[27]  Carsten Griwodz,et al.  Commute path bandwidth traces from 3G networks: analysis and applications , 2013, MMSys.

[28]  Vyas Sekar,et al.  Improving fairness, efficiency, and stability in HTTP-based adaptive video streaming with FESTIVE , 2012, CoNEXT '12.

[29]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.