Characterizing the caching and synchronization performance of a multiprocessor operating system

Good cache memory performance is essential to achieving high CPU utilization in shared-memory multiprocessors. While the performance of caches is determined by both application end operating system (OS ) references, most research has focused on the cache performance of applications afone. This is partiafly due to the difficulty of measuring OS activity and as a resrtl~ the cache performance of the OS is largely unknown. In this paper, we characterize the cache performance of a commercial System V UNIX rtrttrtittg on a four-CPU multiprocessor. The related issue of the performance impact of the OS synchronization activity is tdso stttdicd. For our study, we use a hardware monitor that records the cache misses in the machine without perturbing it. We study three multiprocessor workloads: a parallel Compilq a multiprogrsmmed load and a commercial database. Our results show that OS misses occur frequently enough to stall CPUS for 17-21 ‘Yoof their non-idle time. Further, if we include application misses induced by OS interference in the cache, then the SQU time reaches 25%. A detailed analysis reveals three major sources of OS misses: instruction fetehea, process migratiom and data accesses in block operations. As for synchronization behavior, we find that OS syncfrrordzation has low overhead if supported correctly end that OS locks show good locality and low contention.

[1]  Jeffrey C. Mogul,et al.  The effect of context switches on cache performance , 1991, ASPLOS IV.

[2]  Mark Horowitz,et al.  Cache performance of operating system and multiprogramming workloads , 1988, TOCS.

[3]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[4]  Hendrik A. Goosen,et al.  Paradigm: a highly scalable shared-memory multicomputer architecture , 1991, Computer.

[5]  Brian N. Bershad,et al.  The interaction of architecture and operating system design , 1991, ASPLOS IV.

[6]  Roderic G. G. Cattell The benchmark handbook for database and transaction processing systems , 1991 .

[7]  F. Baskett,et al.  The 4D-MP graphics superworkstation: computing+graphics=40 MIPS+MFLOPS and 100000 lighted polygons per second , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[8]  Andrew W. Wilson,et al.  Hierarchical cache/bus architecture for shared memory multiprocessors , 1987, ISCA '87.

[9]  Josep Torrellas Multiprocessor cache memory performance: characterization and optimization , 1992 .

[10]  Raj Vaswani,et al.  The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors , 1991, SOSP '91.

[11]  Anoop Gupta,et al.  The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[12]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[13]  J. Mcdonald,et al.  Vectorization of a particle simulation method for hypersonic rarefied flow , 1988 .

[14]  Anoop Gupta,et al.  The VMP multiprocessor: initial experience, refinements, and performance evaluation , 1988, ISCA '88.

[15]  Maurice J. Bach The Design of the UNIX Operating System , 1986 .

[16]  Douglas W. Clark,et al.  Cache Performance in the VAX-11/780 , 1983, TOCS.

[17]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[18]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[19]  Brian N. Bershad,et al.  The interaction of architecture and operating system design , 1991, ASPLOS IV.