Enabling Data-Driven Optimization of Quality of Experience for Internet Applications

Todays Internet has become an eyeball economy dominated by applications such as video streaming and VoIP. With most applications relying on user engagement to generate revenues, maintaining high user-perceived QoE (Quality of Experience) has become crucial to ensure high user engagement. For instance, one short buffering interruption leads to 39% less time spent watching videos and causes significant revenue losses for ad-based video sites. Despite increasing expectations for high QoE, existing approaches have limitations to achieve the QoE needed by todays applications. They either require costly re-architecting of the network core, or use suboptimal endpointbased protocols to react to the dynamic Internet performance based on limited knowledge of the network. In this talk, I will present a new approach, which is inspired by the recent success of data-driven approaches in many fields of computing. I will demonstrate that data-driven techniques can improve Internet QoE by utilizing a centralized real-time view of performance across millions of endpoints (clients). I will focus on two fundamental challenges unique to applying datadriven approaches in networking: the need for expressive models to capture complex factors affecting QoE, and the need for scalable platforms to make real-time decisions with fresh data from geo-distributed clients. Our solutions address these challenges in practice by integrating several domain-specific insights in networked applications with machine learning algorithms and systems, and achieve better QoE than using many standard machine learning solutions. I will present end-to-end systems that yield substantial QoE improvement and higher user engagement for video streaming and VoIP. Two of my projects, CFA and VIA, have been used in industry by Conviva and Skype, companies that specialize in QoE optimization for video streaming and VoIP, respectively.

[1]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[2]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM 1989.

[3]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[4]  D. W. Scott,et al.  Variable Kernel Density Estimation , 1992 .

[5]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[6]  Sally Floyd,et al.  TCP and explicit congestion notification , 1994, CCRV.

[7]  Scott Shenker,et al.  Integrated Services in the Internet Architecture : an Overview Status of this Memo , 1994 .

[8]  Larry Peterson,et al.  TCP Vegas: new techniques for congestion detection and avoidance , 1994, SIGCOMM 1994.

[9]  Hans Eriksson,et al.  MBONE: the multicast backbone , 1994, CACM.

[10]  V. Jacobson Congestion avoidance and control , 1988, CCRV.

[11]  David Wetherall,et al.  Towards an active network architecture , 1996, CCRV.

[12]  Srinivasan Seshan,et al.  Analyzing stability in wide-area network performance , 1997, SIGMETRICS '97.

[13]  Scott Shenker,et al.  Core-stateless fair queueing: achieving approximately fair bandwidth allocations in high speed networks , 1998, SIGCOMM '98.

[14]  Zheng Wang,et al.  An Architecture for Differentiated Services , 1998, RFC.

[15]  Srinivasan Seshan,et al.  An integrated congestion management architecture for Internet hosts , 1999, SIGCOMM '99.

[16]  Stefan Savage,et al.  The end-to-end effects of Internet path selection , 1999, SIGCOMM '99.

[17]  Stefan Savage,et al.  The case for informed transport protocols , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[18]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[19]  Lili Qiu,et al.  The content and access dynamics of a busy Web site: findings and implications , 2000 .

[20]  Allan Kuchinsky,et al.  Quality is in the eye of the beholder: meeting users' requirements for Internet quality of service , 2000, CHI.

[21]  Srinivasan Seshan,et al.  A network measurement architecture for adaptive applications , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[22]  Paul Francis,et al.  IDMaps: a global internet host distance estimation service , 2001, TNET.

[23]  Yin Zhang,et al.  On the constancy of internet path properties , 2001, IMW '01.

[24]  D. Estrin,et al.  RSVP: a new resource reservation protocol , 2001 .

[25]  Eric C. Rosen,et al.  Multiprotocol Label Switching Architecture , 2001, RFC.

[26]  Dinesh C. Verma,et al.  ALMI: An Application Level Multicast Infrastructure , 2001, USITS.

[27]  Alec Wolman,et al.  Measurement and Analysis of a Streaming Media Workload , 2001, USITS.

[28]  R. Srikant,et al.  Analysis and design of an adaptive virtual queue (AVQ) algorithm for active queue management , 2001, SIGCOMM '01.

[29]  Robert G. Cole,et al.  Voice over IP performance monitoring , 2001, CCRV.

[30]  Richard Wolski,et al.  Multivariate Resource Performance Forecasting in the Network Weather Service , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[31]  Mark Handley,et al.  Congestion control for high bandwidth-delay product networks , 2002, SIGCOMM '02.

[32]  Ian T. Foster,et al.  Predicting the performance of wide area data transfers , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[33]  Krishna P. Gummadi,et al.  An analysis of Internet content delivery systems , 2002, OPSR.

[34]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[35]  Srinivasan Seshan,et al.  A case for end system multicast , 2002, IEEE J. Sel. Areas Commun..

[36]  Bobby Bhattacharjee,et al.  Scalable application layer multicast , 2002, SIGCOMM '02.

[37]  Manish Jain,et al.  End-to-end available bandwidth: measurement methodology, dynamics, and relation with TCP throughput , 2002, SIGCOMM 2002.

[38]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[39]  M. Frans Kaashoek,et al.  A measurement study of available bandwidth estimation tools , 2003, IMC '03.

[40]  kc claffy,et al.  Bandwidth estimation: metrics, measurement techniques, and tools , 2003, IEEE Netw..

[41]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[42]  Jia Wang,et al.  Locating internet bottlenecks: algorithms, measurements, and implications , 2004, SIGCOMM '04.

[43]  Carlo Caini,et al.  TCP Hybla: a TCP enhancement for heterogeneous networks , 2004, Int. J. Satell. Commun. Netw..

[44]  Robert Nowak,et al.  Network Tomography: Recent Developments , 2004 .

[45]  Jia Wang,et al.  A measurement study of Internet bottlenecks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[46]  Manish Jain,et al.  End-to-end estimation of the available bandwidth variation range , 2005, SIGMETRICS '05.

[47]  C. McCulloch,et al.  Generalized Linear Mixed Models , 2005 .

[48]  Qi He,et al.  On the predictability of large transfer TCP throughput , 2005, SIGCOMM '05.

[49]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[50]  Liam Murphy,et al.  User perception of adapting video quality , 2006, Int. J. Hum. Comput. Stud..

[51]  Chun-Ying Huang,et al.  Quantifying Skype user satisfaction , 2006, SIGCOMM.

[52]  Arun Venkataramani,et al.  iPlane: an information plane for distributed services , 2006, OSDI '06.

[53]  Qian Zhang,et al.  Compound TCP: A scalable and TCP-friendly congestion control for high-speed networks , 2006 .

[54]  Aleksandar Kuzmanovic,et al.  Drafting behind Akamai (travelocity-based detouring) , 2006, SIGCOMM 2006.

[55]  Henning Schulzrinne,et al.  An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol , 2004, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[56]  Hong Yan,et al.  Tesseract: A 4D Network Control Plane , 2007, NSDI.

[57]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[58]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[59]  Haiyong Xie,et al.  A Measurement-based Study of the Skype Peer-to-Peer VoIP Performance , 2007, IPTPS.

[60]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[61]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[62]  Martín Casado,et al.  NOX: towards an operating system for networks , 2008, CCRV.

[63]  Luca De Cicco,et al.  Skype video responsiveness to bandwidth variations , 2008, NOSSDAV.

[64]  Injong Rhee,et al.  CUBIC: a new TCP-friendly high-speed TCP variant , 2008, OPSR.

[65]  Jiang Zhu,et al.  Making Large Scale Deployment of RCP Practical for Real Networks , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[66]  Lakshminarayanan Subramanian,et al.  One more bit is enough , 2005, SIGCOMM '05.

[67]  Aurélien Garivier,et al.  On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.

[68]  H. Schulzrinne,et al.  Skype relay calls: Measurements and experiments , 2008, IEEE INFOCOM Workshops 2008.

[69]  Aditya Akella,et al.  On the treeness of internet latency and bandwidth , 2009, SIGMETRICS '09.

[70]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[71]  Ning Xia,et al.  Inside the bird's nest: measurements of large-scale live VoD from the 2008 olympics , 2009, IMC '09.

[72]  Patrick Wendell,et al.  DONAR: decentralized server selection for cloud services , 2010, SIGCOMM '10.

[73]  Jennifer Rexford,et al.  Putting BGP on the right path: a case for next-hop routing , 2010, Hotnets-IX.

[74]  Luca De Cicco,et al.  An Experimental Investigation of the Akamai Adaptive Video Streaming , 2010, USAB.

[75]  Amit Agarwal,et al.  An argument for increasing TCP's initial congestion window , 2010, CCRV.

[76]  Ion Stoica,et al.  HTTP as the narrow waist of the future internet , 2010, Hotnets-IX.

[77]  Zihui Ge,et al.  Crowdsourcing service-level network event monitoring , 2010, SIGCOMM '10.

[78]  Paul Barford,et al.  A Machine Learning Approach to TCP Throughput Prediction , 2007, IEEE/ACM Transactions on Networking.

[79]  Yin Zhang,et al.  Detecting the performance impact of upgrades in large operational networks , 2010, SIGCOMM 2010.

[80]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[81]  Philippe Rigollet,et al.  Nonparametric Bandits with Covariates , 2010, COLT.

[82]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[83]  Vyas Sekar,et al.  Understanding the impact of video quality on user engagement , 2011, SIGCOMM.

[84]  Xiapu Luo,et al.  Inferring the QoE of HTTP video streaming from user-viewing activities , 2011, W-MUST '11.

[85]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[86]  Larry Peterson,et al.  Framework for CDN Interconnection , 2011 .

[87]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[88]  Marco Mellia,et al.  Dissecting Video Server Selection Strategies in the YouTube CDN , 2011, 2011 31st International Conference on Distributed Computing Systems.

[89]  Zhi-Li Zhang,et al.  Where Do You "Tube"? Uncovering YouTube Server Selection Strategy , 2011, 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN).

[90]  Gregory R. Ganger,et al.  Diagnosing Performance Changes by Comparing Request Flows , 2011, NSDI.

[91]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[92]  Iraj Sodagar,et al.  The MPEG-DASH Standard for Multimedia Streaming Over the Internet , 2011, IEEE MultiMedia.

[93]  Yin Zhang,et al.  Q-score: proactive service quality assessment in a large IPTV system , 2011, IMC '11.

[94]  Ali C. Begen,et al.  An experimental evaluation of rate-adaptation algorithms in adaptive streaming over HTTP , 2011, MMSys.

[95]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[96]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[97]  Minlan Yu,et al.  Tradeoffs in CDN designs for throughput oriented traffic , 2012, CoNEXT '12.

[98]  Srinivasan Keshav Mathematical Foundations of Computer Networking , 2012, Addison-Wesley professional computing series.

[99]  Rob Sherwood,et al.  On Controller Performance in Software-Defined Networks , 2012, Hot-ICE.

[100]  Chen Tian,et al.  Optimizing cost and performance for content multihoming , 2012, SIGCOMM '12.

[101]  Monia Ghobadi,et al.  Trickle: Rate Limiting YouTube Video Streaming , 2012, USENIX Annual Technical Conference.

[102]  Ernst W. Biersack,et al.  A longitudinal view of HTTP video streaming performance , 2012, MMSys '12.

[103]  Polly Huang,et al.  Measuring the perceptual quality of Skype sources , 2012, CCRV.

[104]  A case for a coordinated internet video control plane , 2012, SIGCOMM '12.

[105]  Aiko Pras,et al.  Inside dropbox: understanding personal cloud storage services , 2012, Internet Measurement Conference.

[106]  Van Jacobson,et al.  Networking named content , 2009, CoNEXT '09.

[107]  Fang Hao,et al.  A tale of three CDNs: An active measurement study of Hulu and its CDNs , 2012, 2012 Proceedings IEEE INFOCOM Workshops.

[108]  Paulo J. G. Lisboa,et al.  Making machine learning models interpretable , 2012, ESANN.

[109]  Jeffrey Pang,et al.  Can you GET me now?: estimating the time-to-first-byte of HTTP transactions with passive measurements , 2012, IMC '12.

[110]  Ali C. Begen,et al.  What happens when HTTP adaptive streaming players compete for bandwidth? , 2012, NOSSDAV '12.

[111]  Van Jacobson,et al.  Controlling Queue Delay , 2012, ACM Queue.

[112]  Nick McKeown,et al.  Confused, timid, and unstable: picking a video streaming rate is hard , 2012, Internet Measurement Conference.

[113]  Srinivasan Seshan,et al.  Analyzing the potential benefits of CDN augmentation strategies for internet video workloads , 2013, Internet Measurement Conference.

[114]  I. Stoica,et al.  Developing a predictive model of quality of experience for internet video , 2013, Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication.

[115]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[116]  Nick Feamster,et al.  Answering “What-If” Deployment and Configuration Questions With WISE: Techniques and Deployment Experience , 2008, IEEE/ACM Transactions on Networking.

[117]  David Walker,et al.  Incremental consistent updates , 2013, HotSDN '13.

[118]  Ramesh K. Sitaraman,et al.  Video Stream Quality Impacts Viewer Behavior: Inferring Causality Using Quasi-Experimental Designs , 2012, IEEE/ACM Transactions on Networking.

[119]  Vyas Sekar,et al.  Shedding light on the structure of internet video quality problems in the wild , 2013, CoNEXT.

[120]  Anja Feldmann,et al.  Pushing CDN-ISP collaboration to the limit , 2013, CCRV.

[121]  Hari Balakrishnan,et al.  Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks , 2013, NSDI.

[122]  Fang Hao,et al.  Towards an elastic distributed SDN controller , 2013, HotSDN '13.

[123]  Ethan Katz-Bassett,et al.  Mobile Network Performance from User Devices: A Longitudinal, Multidimensional Analysis , 2014, PAM.

[124]  Shobha Venkataraman,et al.  Prometheus: toward quality-of-experience estimation for mobile apps from passive network measurements , 2014, HotMobile.

[125]  Polly Huang,et al.  Modeling the QoE of Rate Changes in Skype/SILK VoIP Calls , 2014, IEEE/ACM Transactions on Networking.

[126]  Michael J. Freedman,et al.  Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area , 2014, NSDI.

[127]  David Wetherall,et al.  How Speedy is SPDY? , 2014, NSDI.

[128]  Yi Sun,et al.  Using Video-Based Measurements to Generate a Real-Time Network Traffic Map , 2014, HotNets.

[129]  Nick Feamster,et al.  The road to SDN: an intellectual history of programmable networks , 2014, CCRV.

[130]  Bruno Sinopoli,et al.  Toward a Principled Framework to Design Dynamic Adaptive Streaming Algorithms over HTTP , 2014, HotNets.

[131]  Hwee Pink Tan,et al.  Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications , 2014, IEEE Communications Surveys & Tutorials.

[132]  Vyas Sekar,et al.  Improving fairness, efficiency, and stability in HTTP-based adaptive video streaming with FESTIVE , 2012, CoNEXT '12.

[133]  Hari Balakrishnan,et al.  Cicada: Introducing Predictive Guarantees for Cloud Networks , 2014, HotCloud.

[134]  G. Varghese,et al.  Adtributor: Revenue Debugging in Advertising Systems , 2014, NSDI.

[135]  Ali C. Begen,et al.  Probe and Adapt: Rate Adaptation for HTTP Video Streaming At Scale , 2013, IEEE Journal on Selected Areas in Communications.

[136]  Srinivasan Seshan,et al.  Modeling web quality-of-experience on cellular networks , 2014, MobiCom.

[137]  Xi Liu,et al.  EONA: Experience-Oriented Network Architecture , 2014, HotNets.

[138]  Yang Xu,et al.  Video Telephony for End-Consumers: Measurement Study of Google+, iChat, and Skype , 2012, IEEE/ACM Transactions on Networking.

[139]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[140]  R. Johari,et al.  A buffer-based approach to rate adaptation , 2014 .

[141]  Xi Liu,et al.  C3: Internet-Scale Control Plane for Video Quality Optimization , 2015, NSDI.

[142]  Vyas Sekar,et al.  DDA: Cross-Session Throughput Prediction with Applications to Video Bitrate Selection , 2015, ArXiv.

[143]  Fahad R. Dogar,et al.  Leveraging the Power of Cloud for Reliable Wide Area Communication , 2015, HotNets.

[144]  Vishnu Navda,et al.  DiversiFi: robust multi-link interactive streaming , 2015, CoNEXT.

[145]  Mo Dong,et al.  PCC: Re-architecting Congestion Control for Consistent High Performance , 2014, NSDI.

[146]  Adam Wolisz,et al.  Low-Delay Adaptive Video Streaming Based on Short-Term TCP Throughput Prediction , 2015, ArXiv.

[147]  Wei Cao,et al.  On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs , 2015, NIPS.

[148]  Anja Feldmann,et al.  C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.

[149]  Bruno Sinopoli,et al.  A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP , 2015, Comput. Commun. Rev..

[150]  Yanjiao Chen,et al.  From QoS to QoE: A Tutorial on Video Quality Assessment , 2015, IEEE Communications Surveys & Tutorials.

[151]  Andreas Haeberlen,et al.  PRISM: private retrieval of the internet's sensitive metadata , 2015 .

[152]  Dong Zhang,et al.  Kemy: An AQM generator based on machine learning , 2015, 2015 10th International Conference on Communications and Networking in China (ChinaCom).

[153]  Phuoc Tran-Gia,et al.  A Survey on Quality of Experience of HTTP Adaptive Streaming , 2015, IEEE Communications Surveys & Tutorials.

[154]  Paramvir Bahl,et al.  Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.

[155]  Amit Pande,et al.  Data-Guided Approach for Learning and Improving User Experience in Computer Networks , 2015, ACML.

[156]  Michael I. Jordan,et al.  The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox , 2014, CIDR.

[157]  Carlo Curino,et al.  Global Analytics in the Face of Bandwidth and Regulatory Constraints , 2015, NSDI.

[158]  Zhe Wu,et al.  Klotski: Reprioritizing Web Content to Improve User Experience on Mobile Devices , 2015, NSDI.

[159]  Srinivasan Seshan,et al.  Practical, Real-time Centralized Control for CDN-based Live Video Delivery , 2015, SIGCOMM.

[160]  Ramesh K. Sitaraman,et al.  End-User Mapping: Next Generation Request Routing for Content Delivery , 2015, Comput. Commun. Rev..

[161]  Ming Zhang,et al.  Efficiently Delivering Online Services over Integrated Infrastructure , 2016, NSDI.

[162]  Vyas Sekar,et al.  Via: Improving Internet Telephony Call Quality Using Predictive Relay Selection , 2016, SIGCOMM.

[163]  Yi Sun,et al.  CS2P: Improving Video Bitrate Selection and Adaptation with Data-Driven Throughput Prediction , 2016, SIGCOMM.

[164]  Vyas Sekar,et al.  CFA: A Practical Prediction System for Video QoE Optimization , 2016, NSDI.

[165]  Nick McKeown,et al.  Programmable Packet Scheduling at Line Rate , 2016, SIGCOMM.

[166]  Xintong Wang,et al.  Vivaldi : A Decentralized Network Coordinate System , 2016 .

[167]  John Langford,et al.  A Multiworld Testing Decision Service , 2016, ArXiv.

[168]  Mukhtiar Ali Unar,et al.  SAM: Support Vector Machine Based Active Queue Management , 2016, ArXiv.

[169]  Alvin Cheung,et al.  Packet Transactions: High-Level Programming for Line-Rate Switches , 2015, SIGCOMM.

[170]  William May,et al.  HTTP Live Streaming , 2017, RFC.

[171]  Hao Jiang,et al.  Adaptive Concurrency Control: Despite the Looking Glass, One Concurrency Control Does Not Fit All , 2017, CIDR.

[172]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[173]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[174]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[175]  Andreas Haeberlen,et al.  DStress: Efficient Differentially Private Computations on Distributed Data , 2017, EuroSys.

[176]  Shijie Sun,et al.  Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation , 2017, NSDI.

[177]  Vyas Sekar,et al.  Unleashing the Potential of Data-Driven Networking , 2017, COMSNETS.

[178]  Joel J. P. C. Rodrigues,et al.  A Machine Learning-Based Protocol for Efficient Routing in Opportunistic Networks , 2018, IEEE Systems Journal.