An SMT-Selection Metric to Improve Multithreaded Applications' Performance

Simultaneous multithreading (SMT) increases CPU utilization and application performance in many circumstances, but it can be detrimental when performance is limited by application scalability or when there is significant contention for CPU resources. This paper describes an SMT-selection metric that predicts the change in application performance when the SMT level and number of application threads are varied. This metric is obtained online through hardware performance counters with little overhead, and allows the application or operating system to dynamically choose the best SMT level. We have validated the SMT-selection metric using a variety of benchmarks that capture various application characteristics on two different processor architectures. Our results show that the SMT-selection metric is capable of predicting the best SMT level for a given workload in 90% of the cases. The paper also shows that such a metric can be used with a scheduler or application optimizer to help guide its optimization decisions.

[1]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[2]  J. Morris Chang,et al.  Performance Characterization of Java Applications on SMT Processors , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[3]  John D. McCalpin,et al.  Characterization of simultaneous multithreading (SMT) efficiency in POWER5 , 2005, IBM J. Res. Dev..

[4]  Gurindar S. Sohi 25 Years of the International Symposia on Computer Architecture (Selected Papers) , 1998, ISCA Selected Papers.

[5]  A. Janiszewski,et al.  Architectural support for enhanced SMT job scheduling , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[6]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[7]  Rudolf Eigenmann,et al.  Large System Performance of SPEC OMP2001 Benchmarks , 2002, ISHPC.

[8]  Michael Stumm,et al.  Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.

[9]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[11]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[12]  Balaram Sinharoy,et al.  POWER7: IBM's next generation server processor , 2010, 2009 IEEE Hot Chips 21 Symposium (HCS).

[13]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[14]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[15]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[16]  Stijn Eyerman,et al.  Probabilistic job symbiosis modeling for SMT processor scheduling , 2010, ASPLOS XV.

[17]  Erich M. Nahum,et al.  Evaluating the impact of simultaneous multithreading on network servers using real hardware , 2005, SIGMETRICS '05.

[18]  Michael E. Thomadakis,et al.  The Architecture of the Nehalem Processor and Nehalem-EP SMP Platforms , 2011 .