Addressing Cache/Memory Overheads in Enterprise Java CMP Servers

As we enter the era of chip multiprocessor (CMP) architectures, it is important that we explore the scaling characteristics of mainstream server workloads on these platforms. In this paper, we analyze the performance of two significant enterprise Java workloads (SPECjAppServer2004 and SPECjbb2005) on CMP platforms -present and future. We start by characterizing the core, cache and memory behavior of these workloads on the newly released Intel core 2 Duo Xeon platform (dual-core, dual-socket). Our findings from these measurements indicate that these workloads have a significant performance dependence on cache and memory subsystems. In order to guide the evolution of future CMP platforms, we perform a detailed investigation of potential cache and memory architecture choices. This includes analyzing the effects of thread sharing and migration, object allocation and garbage collection. Based on the observed behavior, we propose architectural optimizations along three dimensions: (a) data-less cache line initialization (DCLI), (b) hardware-guided thread collocation (HGTC) and (c) on-socket DRAM caches (OSDC). In this paper, we will describe these optimizations in detail and validate their performance potential based on trace-driven simulations and execution-driven emulation. Overall, we expect that the findings in this paper will guide future CMP architectures for enterprise Java servers.

[1]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[2]  Lance Hammond,et al.  A Single Chip Multiprocessor Integrated with High Density DRAM , 1997 .

[3]  Kunle Olukotun,et al.  The hierarchical multi-bank DRAM: a high-performance architecture for memory integrated with processors , 1997, Proceedings Seventeenth Conference on Advanced Research in VLSI.

[4]  Subramanian S. Iyer,et al.  Embedded DRAM technology: opportunities and challenges , 1999 .

[5]  Ramendra K. Sahoo,et al.  MemorIES: a programmable, real-time hardware emulation tool for multiprocessor server design , 2000, SIGP.

[6]  Erik Hagersten,et al.  Memory system behavior of Java-based middleware , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[7]  R. Morin,et al.  Enterprise Java Performance : Best Practices , 2003 .

[8]  Mahmut T. Kandemir,et al.  Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[9]  A. Mericas,et al.  Workload characterization for the design of future servers , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[10]  Lixin Zhang,et al.  Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[12]  Lizy Kurian John,et al.  Simulating commercial Java throughput workloads: a case study , 2005, 2005 International Conference on Computer Design.

[13]  R. Morin,et al.  A multi-level comparative performance characterization of SPECjbb2005 versus SPECjbb2000 , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[14]  Michael Jones,et al.  Exploring Small-Scale and Large-Scale CMP Architectures for Commercial Java Servers , 2006, 2006 IEEE International Symposium on Workload Characterization.

[15]  Jiulong Shan,et al.  Understanding the Memory Performance of Data-Mining Workloads on Small, Medium, and Large-Scale CMPs Using Hardware-Software Co-simulation , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.