This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Dqbarge: Improving Data-quality Tradeoffs in Large-scale Internet Services Dqbarge: Improving Data-quality Tradeoffs in Large-scale Internet Services

Modern Internet services often involve hundreds of distinct software components cooperating to handle a single user request. Each component must balance the competing goals of minimizing service response time and maximizing the quality of the service provided. This leads to low-level components making data-quality tradeoffs, which we define to be explicit decisions to return lower-fidelity data in order to improve response time or minimize resource usage. We first perform a comprehensive study of low-level data-quality tradeoffs at Facebook. We find that such tradeoffs are widespread. We also find that existing data-quality tradeoffs are often suboptimal because the low-level components making the tradeoffs lack global knowledge that could enable better decisions. Finally, we find that most tradeoffs are reactive, rather than proactive, and so waste resources and fail to mitigate system overload. Next, we develop DQBarge, a system that enables better data-quality tradeoffs by propagating critical information along the causal path of request processing. This information includes data provenance, load metrics, and critical path predictions. DQBarge generates performance and quality models that help low-level components make better, more proactive, tradeoffs. Our evaluation shows that DQBarge helps Internet services mitigate load spikes, improve utilization of spare resources, and implement dynamic capacity planning.

[1]  Ratul Mahajan,et al.  AppInsight: Mobile App Performance Monitoring in the Wild , 2022 .

[2]  Sridhar Ramaswamy,et al.  The Aqua approximate query answering system , 1999, SIGMOD '99.

[3]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[4]  Richard Wolski,et al.  Quorum: flexible quality of service for internet services , 2005, NSDI.

[5]  Rodrigo Fonseca,et al.  Pivot tracing , 2018, USENIX ATC.

[6]  Alan L. Cox,et al.  Whodunit: transactional profiling for multi-tier applications , 2007, EuroSys '07.

[7]  Ronald G. Dreslinski,et al.  Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.

[8]  Ratul Mahajan,et al.  Timecard: controlling user-perceived delays in server-based mobile applications , 2013, SOSP.

[9]  Tao Yang,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Integrated Resource Management for Cluster-based Internet Services , 2022 .

[10]  Thu D. Nguyen,et al.  ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.

[11]  HoffmannHenry,et al.  Dynamic knobs for responsive power-aware computing , 2011 .

[12]  Surajit Chaudhuri,et al.  Dynamic sample selection for approximate query processing , 2003, SIGMOD '03.

[13]  Mahadev Satyanarayanan,et al.  Agile application-aware adaptation for mobility , 1997, SOSP.

[14]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[15]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[16]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[17]  Gautam Kumar,et al.  Hold 'em or fold 'em?: aggregation queries under performance variations , 2016, EuroSys.

[18]  Srikanth Kandula,et al.  Speeding up distributed request-response workflows , 2013, SIGCOMM.

[19]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[20]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[21]  James R. Larus,et al.  Zeta: scheduling interactive services with partial execution , 2012, SoCC '12.

[22]  Donald Beaver,et al.  Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .

[23]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[24]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[25]  Kimberly Keeton,et al.  LazyBase: trading freshness for performance in a scalable database , 2012, EuroSys '12.

[26]  Henry Hoffmann,et al.  JouleGuard: energy guarantees for approximate applications , 2015, SOSP.

[27]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[28]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[29]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[30]  Martin C. Rinard,et al.  Proving acceptability properties of relaxed nondeterministic approximate programs , 2012, PLDI.

[31]  Thomas F. Wenisch,et al.  The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services , 2014, OSDI.