A Non-Work-Conserving Operating System Scheduler For SMT Processors

Simultaneous multithreading (SMT) processors run multiple threads simultaneously on a single processing core, and the threads compete for the processor’s shared resources. Severe resource contention can lead to performance degradation. On SMT processors it is often beneficial to employ a non-work- conserving scheduling policy: running fewer threads simultaneously than the processor allows even if there are threads ready to run. In cases of severe resource contention, non-work- conserving scheduling can alleviate the contention significantly enough so as to result in better performance than if the processor were utilized to the full extent. Conventional operating systems typically do not employ non-work-conserving policies. We present a prototype of an operating system thread scheduler that uses a non-work-conserving policy whenever it may result in better performance. To determine when to use the non-work- conserving policy, the scheduler uses an analytical model that, unlike existing models, is sufficiently simple and practical for use inside the operating system. We demonstrate that the scheduler using our model correctly determines when to use the non-work- conserving policy and improves performance in those cases.

[1]  Gagan Goel,et al.  BioSPLASH: A Sample Workload For Bioinformatics And Computational Biology For Optimizing Next - Gen , 2005 .

[2]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[3]  Jaejin Lee,et al.  Adaptive execution techniques for SMT multiprocessor architectures , 2005, PPOPP.

[4]  Stephen S. Lavenberg,et al.  Mean-Value Analysis of Closed Multichain Queuing Networks , 1980, JACM.

[5]  Carole Dulong,et al.  Performance Scalability of Data-Mining Workloads in Bioinformatics , 2005 .

[6]  Benjamin C. Lee An Architectural Assessment of SPEC CPU Benchmark Relevance , 2006 .

[7]  Erik Hagersten,et al.  StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[8]  Sebastien Hily,et al.  Standard Memory Hierarchy Does Not Fit Simultaneous Multithreading , 1998 .

[9]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[10]  Giuseppe Serazzi,et al.  Performance Gains from Leaving Idle Processors in Multiprocessor Systems , 1995, ICPP.

[11]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[12]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[13]  Mark S. Squillante,et al.  Analytic performance modeling for a spectrum of multithreaded processor architectures , 1995, MASCOTS '95. Proceedings of the Third International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[14]  Dirk Grunwald,et al.  Methods for modeling resource contention on simultaneous multithreading processors , 2005, 2005 International Conference on Computer Design.

[15]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[16]  Margo Seltzer,et al.  Modeling the Effects of Memory Hierarchy Performance on Throughput of Multithreaded Processors , 2005 .

[17]  Mary K. Vernon,et al.  Analytic evaluation of shared-memory systems with ILP processors , 1998, ISCA.

[18]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[19]  Allan Porterfield,et al.  The Tera computer system , 1990, ICS '90.

[20]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[21]  Giuseppe Serazzi,et al.  Analysis of Non-Work-Conserving Processor Partitioning Policies , 1995, JSSPP.

[22]  Richard E. Matick,et al.  Analytical analysis of finite cache penalty and cycles per instruction of a multiprocessor memory hierarchy using miss rates and queuing theory , 2001, IBM J. Res. Dev..

[23]  Peter J. Denning,et al.  Thrashing: its causes and prevention , 1968, AFIPS Fall Joint Computing Conference.

[24]  David A. Padua,et al.  Compile-Time Based Performance Prediction , 1999, LCPC.

[25]  Yan Solihin,et al.  Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in DSM Multiprocessors , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[26]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[27]  Derek L. Eager,et al.  An analytic model of multistage interconnection networks , 1990, SIGMETRICS '90.

[28]  Raif O. Onvural,et al.  Survey of closed queueing networks with blocking , 1990, CSUR.

[29]  Christopher Small,et al.  An overview of the Sam CMT simulator kit , 2004 .

[30]  Yong Luo,et al.  Performance Evaluation of the SGI Origin2000: A Memory-Centric Characterization of LANL ASCI Applications , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[31]  Margo I. Seltzer,et al.  Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design , 2005, USENIX Annual Technical Conference, General Track.

[32]  Anoop Gupta,et al.  Interleaving: a multithreading technique targeting multiprocessors and workstations , 1994, ASPLOS VI.