Performance isolation: sharing and isolation in shared-memory multiprocessors

Shared-memory multiprocessors (SMPs) are being extensively used as general-purpose servers. The tight coupling of multiple processors, memory, and I/O provides enormous computing power in a single system, and enables the efficient sharing of these resources.The operating systems for these machines (UNIX or Windows NT) provide very few controls for sharing the resources of the system among the active tasks or users. This unconstrained sharing model is a serious limitation for a server because the load placed by one user can adversely affect other users' performance in an unpredictable manner. We show that this lack of isolation is caused by the resource allocation scheme (or lack thereof) carried over from singleuser workstations. Multi-user multiprocessor systems require more sophisticated resource management, and we show how the proposed "performance isolation" scheme can address the current weaknesses of these systems. We have implemented performance isolation in the Silicon Graphics IRIX operating system for three important system resources: CPU time, memory, and disk bandwidth. Running a number of workloads we show that our proposed scheme is successful at providing workstation-like isolation under heavy load, SMP-like latency under light load, and SMP-like throughput in all cases.

[1]  Toby J. Teorey,et al.  Properties of disk scheduling policies in multiprogrammed computer systems , 1899, AFIPS '72 (Fall, part I).

[2]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[3]  Henry M. Levy,et al.  Virtual Memory Management in the VAX/VMS Operating System , 1982, Computer.

[4]  G. J. Henry,et al.  The UNIX system: The fair share scheduler , 1984, AT&T Bell Laboratories Technical Journal.

[5]  Judy Kay,et al.  A fair share scheduler , 1988, CACM.

[6]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[7]  Lui Sha,et al.  Priority Inheritance Protocols: An Approach to Real-Time Synchronization , 1990, IEEE Trans. Computers.

[8]  Ramesh Govindan,et al.  Real-Time Disk Storage and Retrieval of Digital Audio/Video Data , 1991 .

[9]  R. Chawla,et al.  The Stealth distributed scheduler , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[10]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[11]  Helen Custer,et al.  Inside Windows NT , 1992 .

[12]  George Eckel Inside Windows NT , 1993 .

[13]  David Kotz,et al.  A Detailed Simulation Model of the HP 97560 Disk Drive , 1994 .

[14]  Eoin Hyden,et al.  Operating system support for quality of service , 1994 .

[15]  Anoop Gupta,et al.  Scheduling and page migration for multiprocessor compute servers , 1994, ASPLOS VI.

[16]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[17]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[18]  Carl A. Waldspurger,et al.  Lottery and stride scheduling: flexible proportional-share resource management , 1995 .

[19]  Anoop Gupta,et al.  Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[20]  Carl A. Waldspurger,et al.  Stride Scheduling: Deterministic Proportional- Share Resource Management , 1995 .

[21]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[22]  David R. Cheriton,et al.  A market approach to operating system memory allocation , 1996 .

[23]  Peter Druschel,et al.  Lazy receiver processing (LRP): a network subsystem architecture for server systems , 1996, OSDI '96.

[24]  Ragunathan Rajkumar,et al.  Operating system resource reservation for real-time and multimedia applications , 1997 .

[25]  Donna N. Dillenberger,et al.  Adaptive Algorithms for Managing a Distributed Data Processing Workload , 1997, IBM Syst. J..

[26]  Mendel Rosenblum,et al.  Using complete machine simulation to understand computer system behavior , 1998 .