Microarchitectural implications of event-driven server-side web applications

Enterprise Web applications are moving towards server-side scripting using managed languages. Within this shifting context, event-driven programming is emerging as a crucial programming model to achieve scalability. In this paper, we study the microarchitectural implications of server-side scripting, JavaScript in particular, from a unique event-driven programming model perspective. Using the Node.js framework, we come to several critical microarchitectural conclusions. First, unlike traditional server-workloads such as CloudSuite and BigDataBench that are based on the conventional thread-based execution model, event-driven applications are heavily single-threaded, and as such they require significant single-thread performance. Second, the single-thread performance is severely limited by the front-end inefficiencies of today's server processor microarchitecture, ultimately leading to overall execution inefficiencies. The front-end inefficiencies stem from the unique combination of limited intra-event code reuse and large inter-event reuse distance. Third, through a deep understanding of event-specific characteristics, architects can mitigate the front-end inefficiencies of the managed-language-based event-driven execution via a combination of instruction cache insertion policy and prefetcher.

[1]  Ting Cao,et al.  The Yin and Yang of power and performance for asymmetric hardware and managed software , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[2]  J. P. Grossman,et al.  Hardware support for fine-grained event-driven computation in Anton 2 , 2013, ASPLOS '13.

[3]  Perry Cheng,et al.  Myths and realities: the performance impact of garbage collection , 2004, SIGMETRICS '04/Performance '04.

[4]  David E. Culler,et al.  SEDA: An Architecture for Scalable, Well-Conditioned Internet Services , 2001 .

[5]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[6]  Douglas C. Schmidt,et al.  High performance web servers on windows NT design and performance , 1997 .

[7]  Willy Zwaenepoel,et al.  Flash: An efficient and portable Web server , 1999, USENIX Annual Technical Conference, General Track.

[8]  Stijn Eyerman,et al.  An Evaluation of High-Level Mechanistic Core Models , 2014, ACM Trans. Archit. Code Optim..

[9]  Luis Ceze,et al.  Checked Load: Architectural support for JavaScript type-checking on mobile processors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[10]  Lizy Kurian John,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007, ISCA '07.

[11]  Onur Mutlu,et al.  Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[12]  Katherine A. Yelick Exascale opportunities and challenges , 2011, HPDC '11.

[13]  David E. Culler,et al.  A Design Framework for Highly Concurrent Systems , 2000 .

[14]  Onur Mutlu,et al.  Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[16]  Lieven Eeckhout,et al.  Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.

[17]  Michael M. Swift,et al.  Reducing memory reference energy with opportunistic virtual caching , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[18]  M. Orhon The X Window System , 2005 .

[19]  Mark D. Hill,et al.  Tradeoffs in supporting two page sizes , 1992, ISCA '92.

[20]  Trevor Mudge,et al.  Thread-level parallelism and interactive performance of desktop applications , 2000, ASPLOS IX.

[21]  Thomas F. Wenisch,et al.  Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.

[22]  Scott A. Mahlke,et al.  EFetch: Optimizing instruction fetch for event-driven web applications , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[23]  Scott A. Mahlke,et al.  Accelerating asynchronous programs through Event Sneak Peek , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[24]  James E. Smith,et al.  A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.

[25]  Josep Torrellas,et al.  Improving JavaScript performance by deconstructing the type system , 2014, PLDI.

[26]  Takeshi Ogasawara Workload characterization of server-side JavaScript , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[27]  Jan Vitek,et al.  An analysis of the dynamic behavior of JavaScript programs , 2010, PLDI '10.

[28]  Scott A. Mahlke,et al.  Dynamically accelerating client-side web applications through decoupled execution , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[29]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[30]  K JohnLizy,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007 .

[31]  Anastasia Ailamaki,et al.  SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[32]  Gu-Yeon Wei,et al.  An ultra low power system architecture for sensor network applications , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[33]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[34]  Thomas F. Wenisch,et al.  Temporal instruction fetch streaming , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[35]  Gu-Yeon Wei,et al.  Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[36]  Yangdong Deng,et al.  Distributed time, conservative parallel logic simulation on GPUs , 2010, Design Automation Conference.

[37]  Vijay Janapa Reddi,et al.  Event-based scheduling for energy-efficient QoS (eQoS) in mobile Web applications , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[38]  Anand Sivasubramaniam,et al.  Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks , 2002, SIGMETRICS '02.

[39]  John K. Ousterhout,et al.  Why Threads Are A Bad Idea , 2013 .

[40]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[41]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[42]  Lieven Eeckhout,et al.  Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[43]  J. Ticehurst Cacti , 1983 .

[44]  Trevor N. Mudge,et al.  Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.

[45]  Aamer Jaleel,et al.  CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[46]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[47]  Alan L. Cox,et al.  Practical, transparent operating system support for superpages , 2002, OPSR.

[48]  Lambert M. Surhone,et al.  Node.js , 2010 .

[49]  Robert Tappan Morris,et al.  Event-driven programming for robust software , 2002, EW 10.

[50]  Scott A. Mahlke,et al.  Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[51]  David A. Wood,et al.  Cost-Effective Parallel Computing , 1995, Computer.

[52]  D. Murphy Let's chat. , 2013, Radiology management.