Operating system scheduling for chip multithreaded processors

This dissertation addresses operating system thread scheduling for chip multithreaded processors. Chip multithreaded processors are becoming mainstream thanks to their superior performance and power characteristics. Threads running concurrently on a chip multithreaded processor share the processor's resources. Resource contention, and accordingly performance, depends on characteristics of the co-scheduled threads. The operating system controls thread co-scheduling, and thus affects performance of a chip multithreaded system. This dissertation describes the design and implementation of three new scheduling algorithms for chip multithreaded processors: the non-work-conserving algorithm, the target-miss-rate algorithm, and the cache-fair algorithm. These algorithms target contention for the second-level cache, a recognized performance-critical resource, and pursue several objectives: performance optimization, fairness, and performance predictability. These algorithms use novel analytical performance models and online performance monitoring, and do not require input from applications or changes to existing hardware structures. This dissertation describes the implementation of these algorithms in a commercial operating system and evaluates their effectiveness.

[1]  S. Parekh,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[2]  Santosh G. Abraham,et al.  Chip multithreading: opportunities and challenges , 2005, 11th International Symposium on High-Performance Computer Architecture.

[3]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[4]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[5]  Dirk Grunwald,et al.  Methods for modeling resource contention on simultaneous multithreading processors , 2005, 2005 International Conference on Computer Design.

[6]  Giuseppe Serazzi,et al.  Analysis of Non-Work-Conserving Processor Partitioning Policies , 1995, JSSPP.

[7]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[8]  Alex Settle,et al.  Architectural Support for Enhanced SMT Job Scheduling , 2004, IEEE PACT.

[9]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[10]  Sebastien Hily,et al.  Standard Memory Hierarchy Does Not Fit Simultaneous Multithreading , 1998 .

[11]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Gagan Goel,et al.  BioSPLASH: A Sample Workload For Bioinformatics And Computational Biology For Optimizing Next - Gen , 2005 .

[13]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[14]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[15]  Dean M. Tullsen,et al.  Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.

[16]  Abhishek Chandra,et al.  Scheduler-Aware Virtual Memory Management , 2003 .

[17]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[18]  Erik Hagersten,et al.  StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[19]  Håkan Grahn,et al.  SimICS/Sun4m: A Virtual Workstation , 1998, USENIX Annual Technical Conference.

[20]  Anoop Gupta,et al.  Interleaving: a multithreading technique targeting multiprocessors and workstations , 1994, ASPLOS VI.

[21]  Mark S. Squillante,et al.  Evaluation of Multithreaded Uniprocessors for Commercial Application Environments , 1996, ISCA.

[22]  R. Balasubramonian,et al.  Dynamically managing the communication-parallelism trade-off in future clustered processors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[23]  Jun Nakajima,et al.  Enhancements for hyper-threading technology in the operating system: seeking the optimal scheduling , 2002, WIESS'02.

[24]  David J. DeWitt,et al.  DBMSs on modern processors: Where does time go? , 1999 .

[25]  Peter J. Denning,et al.  Thrashing: its causes and prevention , 1968, AFIPS Fall Joint Computing Conference.

[26]  Alan M. Frieze,et al.  On Balls and Bins with Deletions , 1998, RANDOM.

[27]  Richard E. Matick,et al.  Analytical analysis of finite cache penalty and cycles per instruction of a multiprocessor memory hierarchy using miss rates and queuing theory , 2001, IBM J. Res. Dev..

[28]  Mark Horowitz,et al.  An analytical cache model , 1989, TOCS.

[29]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[30]  G. Edward Suh,et al.  Analytical cache models with applications to cache partitioning , 2001, ICS '01.

[31]  James E. Smith,et al.  Comparing program phase detection techniques , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[32]  Donald Yeung,et al.  Transparent threads: resource sharing in SMT processors for high single-thread performance , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[33]  Josep Torrellas,et al.  Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in DSM Multiprocessors , 1999, International Conference on Software Composition.

[34]  James Laudon,et al.  Performance/Watt: the new server focus , 2005, CARN.

[35]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[36]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[37]  Mark S. Squillante,et al.  Analytic performance modeling for a spectrum of multithreaded processor architectures , 1995, MASCOTS '95. Proceedings of the Third International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[38]  Benjamin C. Lee An Architectural Assessment of SPEC CPU Benchmark Relevance , 2006 .

[39]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[40]  James H. Anderson,et al.  Real-Time Scheduling on Multicore Platforms , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[41]  Irene Mavrommati,et al.  Design principles , 2001 .

[42]  Jaejin Lee,et al.  Adaptive execution techniques for SMT multiprocessor architectures , 2005, PPOPP.

[43]  Giuseppe Serazzi,et al.  Performance Gains from Leaving Idle Processors in Multiprocessor Systems , 1995, ICPP.

[44]  David A. Padua,et al.  Compile-Time Based Performance Prediction , 1999, LCPC.

[45]  Francisco J. Cazorla,et al.  Predictable performance in SMT processors , 2004, CF '04.

[46]  Mark Moir,et al.  Hybrid transactional memory , 2006, ASPLOS XII.

[47]  Carole Dulong,et al.  Performance Scalability of Data-Mining Workloads in Bioinformatics , 2005 .

[48]  Rohit Jain,et al.  Soft real-time scheduling on simultaneous multithreaded processors , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[49]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[50]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[51]  Steven Raasch,et al.  Applications of Thread Prioritization in SMT Processors , 1999 .

[52]  Susan J. Eggers,et al.  Thread-Sensitive Scheduling for SMT Processors , 2000 .

[53]  Derek L. Eager,et al.  An analytic model of multistage interconnection networks , 1990, SIGMETRICS '90.

[54]  Raif O. Onvural,et al.  Survey of closed queueing networks with blocking , 1990, CSUR.

[55]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[56]  Richard McDougall,et al.  Solaris Internals: Core Kernel Architecture , 2000 .