Characterizing the resource-sharing levels in the UltraSPARC T2 processor

Thread level parallelism (TLP) has become a popular trend to improve processor performance, overcoming the limitations of extracting instruction level parallelism. Each TLP paradigm, such as Simultaneous Multithreading or Chip-Multiprocessors, provides different benefits, which has motivated processor vendors to combine several TLP paradigms in each chip design. Even if most of these combined-TLP designs are homogeneous, they present different levels of hardware resource sharing, which introduces complexities on the operating system scheduling and load balancing. Commonly, processor designs provide two levels of resource sharing: Inter-core in which only the highest levels of the cache hierarchy are shared, and Intra-core in which most of the hardware resources of the core are shared . Recently, Sun Microsystems has released the UltraSPARC T2, a processor with three levels of hardware resource sharing: InterCore, IntraCore, and IntraPipe. In this work, we provide the first characterization of a three-level resource sharing processor, the UltraSPARC T2, and we show how multi-level resource sharing affects the operating system design. We further identify the most critical hardware resources in the T2 and the characteristics of applications that are not sensitive to resource sharing. Finally, we present a case study in which we run a real multithreaded network application, showing that a resource sharing aware scheduler can improve the system throughput up to 55%.

[1]  Alexandra Fedorova,et al.  Base Vectors : A Potential Technique for Micro-architectural Classification of Applications , 2007 .

[2]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[3]  Margo Seltzer,et al.  Operating system scheduling for chip multithreaded processors , 2006 .

[4]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[5]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[6]  Josh Aas Understanding the Linux 2.6.8.1 CPU Scheduler , 2005 .

[7]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[8]  Alex Settle,et al.  Understanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors , 2005, J. Instr. Level Parallelism.

[9]  Burton J. Smith,et al.  The Horizon supercomputing system: architecture and software , 1988, Proceedings. SUPERCOMPUTING '88.

[10]  Patrick Crowley,et al.  Network Processors: An Introduction to Design Issues , 2003 .

[11]  S. Storino,et al.  A commercial multithreaded RISC processor , 1998, 1998 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC. First Edition (Cat. No.98CH36156).

[12]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[13]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[14]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[15]  R. Gioiosa,et al.  Analysis of system overhead on parallel computers , 2004, Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004..

[16]  Susan J. Eggers,et al.  Thread-Sensitive Scheduling for SMT Processors , 2000 .

[17]  Toshio Nakatani,et al.  Performance Studies of Commercial Workloads on a Multi-core System , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[18]  George Varghese,et al.  A pipelined memory architecture for high throughput network processors , 2003, ISCA '03.

[19]  Eddie Kohler,et al.  Observed Structure of Addresses in IP Traffic , 2002, IEEE/ACM Transactions on Networking.

[20]  Michael D. Smith,et al.  Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[21]  Patrick Crowley,et al.  Network Processor Design: Issues and Practices , 2002 .

[22]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[23]  Herbert Bos,et al.  Network Processor Design, Issues and Practices, Vol.3 , 2004 .

[24]  Francisco J. Cazorla,et al.  Measuring the performance of multithreaded processors , 2007 .

[25]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[26]  Barton P. Miller,et al.  A comparison of interactivity in the Linux 2.6 scheduler and an MLFQ scheduler , 2007, Softw. Pract. Exp..

[27]  Francisco J. Cazorla,et al.  Measuring Operating System Overhead on CMT Processors , 2008, 2008 20th International Symposium on Computer Architecture and High Performance Computing.