Efficient Execution of Bursty Applications

The performance of user-facing applications is critical to client platforms. Many of these applications are event-driven and exhibit “bursty” behavior: the application is generally idle but generates bursts of activity in response to human interaction. We study one example of a bursty application, web-browsers, and produce two important insights: (1) Activity bursts contain false parallelism, bringing many cores out of a deep sleep to inefficiently render a single webpage, and (2) these bursts are highly compute driven, and thus scale nearly linearly with frequency. We show average performance gains/energy reductions of 14%/17% respectively on real hardware by statically moving threads from multiple cores to a single core. We then propose dynamic hardware driven thread migration and scheduling enhancements that detect these bursts, leading to further benefits.

[1]  Ronald G. Dreslinski,et al.  Full-system analysis and characterization of interactive smartphone applications , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[2]  Benjamin Livshits,et al.  JSMeter: Characterizing Real-World Behavior of JavaScript Programs , 2009 .

[3]  Krisztián Flautner,et al.  Evolution of thread-level parallelism in desktop applications , 2010, ISCA.

[4]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[5]  Michael Stumm,et al.  Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.

[6]  Vijay Janapa Reddi,et al.  Event-based scheduling for energy-efficient QoS (eQoS) in mobile Web applications , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[7]  Vijay Janapa Reddi,et al.  High-performance and energy-efficient mobile web browsing on big/little systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[8]  Marvin Theimer,et al.  Using threads in interactive systems: a case study , 1993, SOSP '93.

[9]  Pradip Bose,et al.  SMT-centric power-aware thread placement in chip multiprocessors , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[10]  Dam Sunwoo,et al.  A structured approach to the simulation, analysis and characterization of smartphone applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[11]  Hyesoon Kim,et al.  The AM-Bench: An Android Multimedia Benchmark Suite , 2012 .

[12]  Carole-Jean Wu,et al.  Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite - MobileBench , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[13]  Scott A. Mahlke,et al.  EFetch: Optimizing instruction fetch for event-driven web applications , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).