Learning to Coordinate Video Codec with Transport Protocol for Mobile Video Telephony

Despite the pervasive use of real-time video telephony services, the users' quality of experience (QoE) remains unsatisfactory, especially over the mobile Internet. Previous work studied the problem via controlled experiments, while a systematic and in-depth investigation in the wild is still missing. To bridge the gap, we conduct a large-scale measurement campaign on \appname, an operational mobile video telephony service. Our measurement logs fine-grained performance metrics over 1 million video call sessions. Our analysis shows that the application-layer video codec and transport-layer protocols remain highly uncoordinated, which represents one major reason for the low QoE. We thus propose \name, a machine learning based framework to resolve the issue. Instead of blindly following the transport layer's estimation of network capacity, \name reviews historical logs of both layers, and extracts high-level features of codec/network dynamics, based on which it determines the highest bitrates for forthcoming video frames without incurring congestion. To attain the ability, we train \name with the aforementioned massive data traces using a custom-designed imitation learning algorithm, which enables \name to learn from past experience. We have implemented and incorporated \name into \appname. Our experiments show that \name outperforms state-of-the-art solutions, improving video quality while reducing stalling time by multi-folds under various practical scenarios.

[1]  Vyas Sekar,et al.  Improving fairness, efficiency, and stability in HTTP-based adaptive video streaming with FESTIVE , 2012, CoNEXT '12.

[2]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[4]  Qiang Xu,et al.  PROTEUS: network performance forecast for real-time, interactive mobile applications , 2013, MobiSys '13.

[5]  Saverio Mascolo,et al.  Congestion Control for Web Real-Time Communication , 2017, IEEE/ACM Transactions on Networking.

[6]  Haitian Pang,et al.  First Mile in Crowdsourced Live Streaming: A Content Harvest Network Approach , 2017, ACM Multimedia.

[7]  Yisong Yue,et al.  Smooth Imitation Learning for Online Sequence Prediction , 2016, ICML.

[8]  Yang Xu,et al.  Profiling Skype video calls: Rate control and video quality , 2012, 2012 Proceedings IEEE INFOCOM.

[9]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[10]  Yuan-Fang Wang,et al.  Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[12]  Yang Xu,et al.  Modeling and Analysis of Skype Video Calls: Rate Control and Video Quality , 2013, IEEE Transactions on Multimedia.

[13]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[14]  Soh-Khim Ong,et al.  Immersive Augmented Reality Environment for the Teleoperation of Maintenance Robots , 2017 .

[15]  Itu-T and Iso Iec Jtc Advanced video coding for generic audiovisual services , 2010 .

[16]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[17]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Xinyu Zhang,et al.  Accelerating Mobile Web Loading Using Cellular Link Information , 2017, MobiSys.

[20]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[21]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[22]  Charles T. Loop,et al.  Holoportation: Virtual 3D Teleportation in Real-time , 2016, UIST.

[23]  Yang Xu,et al.  “Can you SEE me now?” A measurement study of mobile video calls , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Brighten Godfrey,et al.  Internet Congestion Control via Deep Reinforcement Learning , 2018, ArXiv.

[26]  Injong Rhee,et al.  Tackling bufferbloat in 3G/4G networks , 2012, Internet Measurement Conference.

[27]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[28]  Cheng-Hsin Hsu,et al.  Quantifying User Satisfaction in Mobile Cloud Games , 2014, MoVid@MMSys.

[29]  Hari Balakrishnan,et al.  Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks , 2013, NSDI.

[30]  Yang Xu,et al.  Video Telephony for End-Consumers: Measurement Study of Google+, iChat, and Skype , 2012, IEEE/ACM Transactions on Networking.

[31]  Mo Dong,et al.  PCC: Re-architecting Congestion Control for Consistent High Performance , 2014, NSDI.

[32]  Vyas Sekar,et al.  Via: Improving Internet Telephony Call Quality Using Predictive Relay Selection , 2016, SIGCOMM.

[33]  Mo Dong,et al.  PCC Vivace: Online-Learning Congestion Control , 2018, NSDI.

[34]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[35]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36]  Xinyu Zhang,et al.  POI360: Panoramic Mobile Video Telephony over LTE Cellular Networks , 2017, CoNEXT.

[37]  Janne Salonen,et al.  VP8 Data Format and Decoding Guide , 2011, RFC.

[38]  Yonggang Wen,et al.  QoE-driven cache management for HTTP adaptive bit rate (ABR) streaming over wireless networks , 2012, 2012 IEEE Global Communications Conference (GLOBECOM).

[39]  Swarun Kumar,et al.  piStream: Physical Layer Informed Adaptive Video Streaming over LTE , 2015, MobiCom.

[40]  Satinder Singh,et al.  Self-Imitation Learning , 2018, ICML.

[41]  Ryan Shea,et al.  Cloud gaming: architecture and performance , 2013, IEEE Network.

[42]  Nick McKeown,et al.  Confused, timid, and unstable: picking a video streaming rate is hard , 2012, Internet Measurement Conference.

[43]  Xiaoli Ma,et al.  Improving TCP Congestion Control with Machine Intelligence , 2018, NetAI@SIGCOMM.

[44]  Anirudh Sivaraman,et al.  Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.

[45]  Lea Skorin-Kapov,et al.  Cloud gaming QoE models for deriving video encoding adaptation strategies , 2016, MMSys.

[46]  Lakshminarayanan Subramanian,et al.  Adaptive Congestion Control for Unpredictable Cellular Networks , 2015, Comput. Commun. Rev..

[47]  Yi Sun,et al.  CS2P: Improving Video Bitrate Selection and Adaptation with Data-Driven Throughput Prediction , 2016, SIGCOMM.

[48]  Waleed Meleis,et al.  QTCP: Adaptive Congestion Control with Reinforcement Learning , 2019, IEEE Transactions on Network Science and Engineering.

[49]  Matti Siekkinen,et al.  A First Look at Quality of Mobile Live Streaming Experience: the Case of Periscope , 2016, Internet Measurement Conference.

[50]  Henning Schulzrinne,et al.  RTP: A Transport Protocol for Real-Time Applications , 1996, RFC.

[51]  Gang Wang,et al.  Anatomy of a Personalized Livestreaming System , 2016, Internet Measurement Conference.

[52]  Keith Winstein,et al.  Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol , 2018, NSDI.

[53]  Tong Li,et al.  Toward Cloud-Based Distributed Interactive Applications: Measurement, Modeling, and Analysis , 2018, IEEE/ACM Transactions on Networking.