Tail latency in node.js: energy efficient turbo boosting for long latency requests in event-driven web services

Cloud-based Web services are shifting to the event-driven, scripting language-based programming model to achieve productivity, flexibility, and scalability. Implementations of this model, however, generally suffer from long tail latencies, which we measure using Node.js as a case study. Unlike in traditional thread-based systems, reducing long tails is difficult in event-driven systems due to their inherent asynchronous programming model. We propose a framework to identify and optimize tail latency sources in scripted event-driven Web services. We introduce profiling that allows us to gain deep insights into not only how asynchronous event-driven execution impacts application tail latency but also how the managed runtime system overhead exacerbates the tail latency issue further. Using the profiling framework, we propose an event-driven execution runtime design that orchestrates the hardware’s boosting capabilities to reduce tail latency. We achieve higher tail latency reductions with lower energy overhead than prior techniques that are unaware of the underlying event-driven program execution model. The lessons we derive from Node.js apply to other event-driven services based on scripting language frameworks.

[1]  Vijay Janapa Reddi,et al.  Event-based scheduling for energy-efficient QoS (eQoS) in mobile Web applications , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[2]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[3]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[4]  Jialin Li,et al.  Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[5]  Amit A. Levy,et al.  Blade: A Data Center Garbage Collector , 2015, ArXiv.

[6]  Willy Zwaenepoel,et al.  Flash: An efficient and portable Web server , 1999, USENIX Annual Technical Conference, General Track.

[7]  James E. Smith,et al.  Virtual machines - versatile platforms for systems and processes , 2005 .

[8]  Thu D. Nguyen,et al.  Exploiting Heterogeneity for Tail Latency and Energy Efficiency , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Anke Schmid Garbage Collection Algorithms For Automatic Dynamic Memory Management , 2016 .

[10]  Fabrice Paillet,et al.  FIVR — Fully integrated voltage regulators on 4th generation Intel® Core™ SoCs , 2014, 2014 IEEE Applied Power Electronics Conference and Exposition - APEC 2014.

[11]  Ricardo Bianchini,et al.  Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services , 2015, ASPLOS.

[12]  Daniel Sánchez,et al.  Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.

[13]  Daniel Sánchez,et al.  Rubik: Fast analytical power management for latency-critical systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  Christoforos E. Kozyrakis,et al.  Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[15]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.

[16]  Takeshi Ogasawara Workload characterization of server-side JavaScript , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[17]  Josep Torrellas,et al.  Improving JavaScript performance by deconstructing the type system , 2014, PLDI.

[18]  Onur Mutlu,et al.  Flexible reference-counting-based hardware acceleration for garbage collection , 2009, ISCA '09.

[19]  Ravi Nair,et al.  System Virtual Machines , 2005 .

[20]  Ronald G. Dreslinski,et al.  Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[21]  D. Murphy Let's chat. , 2013, Radiology management.

[22]  John Kubiatowicz,et al.  A Hardware Accelerator for Tracing Garbage Collection , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[23]  John Kubiatowicz,et al.  Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications , 2016, ASPLOS.

[24]  Hwanju Kim,et al.  TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services , 2016, ASPLOS.

[25]  Thomas E. Anderson,et al.  FlexNIC: Rethinking Network DMA , 2015, HotOS.

[26]  Vijay Janapa Reddi,et al.  Microarchitectural implications of event-driven server-side web applications , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Manu Sridharan,et al.  Effective race detection for event-driven programs , 2013, OOPSLA.

[28]  James R. Larus,et al.  Using Cohort-Scheduling to Enhance Server Performance , 2002, USENIX Annual Technical Conference, General Track.

[29]  Douglas C. Schmidt,et al.  High performance web servers on windows NT design and performance , 1997 .

[30]  Manu Sridharan,et al.  Race detection for web applications , 2012, PLDI.

[31]  Mike Cantelon,et al.  Node.js in Action , 2013 .

[32]  Satish Narayanasamy,et al.  Race detection for event-driven mobile applications , 2014, PLDI.

[33]  David E. Culler,et al.  A Design Framework for Highly Concurrent Systems , 2000 .

[34]  Lingjia Tang,et al.  Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference , 2016, ISCA.

[35]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[36]  Xi Yang,et al.  Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading , 2016, USENIX Annual Technical Conference.

[37]  Julia L. Lawall,et al.  Memory-manager/scheduler co-design: optimizing event-driven servers to improve cache behavior , 2006, ISMM '06.

[38]  Frank Tip,et al.  Static analysis of event-driven Node.js JavaScript applications , 2015, OOPSLA.