Performance Characterization of .NET Benchmarks

Managed language frameworks are pervasive today, especially in modern datacenters. .NET is one such framework that is used widely in Microsoft Azure but has not been well-studied. Applications built on these frameworks have different characteristics compared to traditional SPEC-like programs due to the presence of a managed runtime. This affects the tradeoffs associated with designing hardware for such applications. Our goal is to study hardware performance bottlenecks in .NET applications. To find suitable benchmarks, we use Principal Component Analysis (PCA) to find redundancies in a set of open-source .NET and ASP.NET benchmarks and use hierarchical clustering to create representative subsets. We perform microarchitecture and application-level characterization of these subsets and show that they are significantly different from SPEC CPU17 benchmarks in branch and memory behavior, and hence merit consideration in architecture research. In-depth analysis using the Top-Down methodology reveals that .NET benchmarks are significantly more frontend bound. We also analyze the effect of managed runtime events such as JIT (Just-in-Time) compilation and GC (Garbage Collection). Among other findings, GC improves cache performance significantly and JITing could benefit from aggressive prefetching and transformation of hardware microarchitectural state to prevent frequent cold starts. As computing increasingly moves to the cloud and managed languages grow even more in popularity, it is important to consider .NET-like benchmarks in architecture studies.

[1]  Mira Mezini,et al.  Da capo con scala: design and analysis of a scala benchmark suite for the java virtual machine , 2011, OOPSLA '11.

[2]  Onur Mutlu,et al.  Flexible reference-counting-based hardware acceleration for garbage collection , 2009, ISCA '09.

[3]  Aamer Jaleel,et al.  Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Reena Panda,et al.  Wait of a Decade: Did SPEC CPU 2017 Broaden the Performance Horizon? , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[5]  Ahmad Yasin,et al.  A Top-Down method for performance analysis and counters architecture , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[6]  김정인,et al.  Asp.net , 2003, Encyclopedia of Social Network Analysis and Mining.

[7]  John Kubiatowicz,et al.  A Hardware Accelerator for Tracing Garbage Collection , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[8]  Cody Coleman,et al.  MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[9]  Thomas H. Parker,et al.  What is π , 1991 .

[10]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  M ChilimbiTrishul,et al.  Using generational garbage collection to implement cache-conscious data placement , 1998 .

[12]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[13]  M. Desnoyers,et al.  The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux , 2006 .

[14]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[15]  Lizy Kurian John,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007, ISCA '07.

[16]  Tosiron Adegbija,et al.  A Workload Characterization of the SPEC CPU2017 Benchmark Suite , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[17]  Andrea Rosà,et al.  Renaissance: benchmarking suite for parallel applications on the JVM , 2019, PLDI.

[18]  Michael Stumm,et al.  FlexSC: Flexible System Call Scheduling with Exception-Less System Calls , 2010, OSDI.

[19]  Thorsten Holz,et al.  SoK: Make JIT-Spray Great Again , 2018, WOOT @ USENIX Security Symposium.

[20]  James R. Larus,et al.  Using generational garbage collection to implement cache-conscious data placement , 1998, ISMM '98.

[21]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[22]  Mitsuhisa Sato,et al.  The Supercomputer "Fugaku" and Arm-SVE enabled A64FX processor for energy-efficiency and sustained application performance , 2020, ISPDC.

[23]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.