An effective shared memory allocator for reducing false sharing in NUMA multiprocessors

Non-uniform memory access (NUMA) time is an important issue in the design of large scale shared memory multiprocessors. One implication of NUMA architecture, however, is that locality of reference is crucial to the performance of the entire systems. So exploitation of locality of reference is necessarily supported by one or more system levels for efficient data sharing. Unfortunately, data sharing introduces a problem called false sharing which occurs when several independent objects which may have different access patterns are allocated to the same unit of movable memory (in our case, a page of virtual memory). In this paper we propose a simple and effective shared memory allocation mechanism for reducing the false sharing. Our design goal is to reduce the occurrences of false sharing misses by allocating independent objects that may have different access patterns to different pages. We use execution-driven simulation of real parallel applications to evaluate the effectiveness of our shared memory allocator. Our observation shows that by using our shared memory allocator, considerable amount of false sharing misses can be reduced and so the overhead of memory coherence protocol can also be reduced.

[1]  Ricardo Bianchini,et al.  Software caching on cache-coherent multiprocessors , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[2]  Jack E. Veenstra,et al.  Mint Tutorial and User Manual , 1993 .

[3]  Vivek Khera,et al.  An Architecture-Independent Analysis of False Sharing , 1993 .

[4]  Benjamin G. Zorn,et al.  Memory allocation costs in large C and C++ programs , 1994, Softw. Pract. Exp..

[5]  Evangelos P. Markatos,et al.  Trace-driven simulation of data alignment and other factors affecting update and invalidate based coherent memory , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[6]  Susan J. Eggers,et al.  Eliminating False Sharing , 1991, ICPP.

[7]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[8]  Dirk Grunwald,et al.  Improving the cache locality of memory allocation , 1993, PLDI '93.

[9]  Carla Schlatter Ellis,et al.  Memory allocation constructs to complement NUMA memory management , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[10]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[11]  Josep Torrellas,et al.  Share Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates , 1990, ICPP.

[12]  Michael L. Scott,et al.  False sharing and its effect on shared memory performance , 1993 .

[13]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[14]  Yookun Cho,et al.  Using Adjustable DELAY Counter for Page Replication in NUMA Multiprocessors , 1995, Parallel and Distributed Computing and Systems.

[15]  Benjamin G. Zorn,et al.  Using lifetime predictors to improve memory allocation performance , 1993, PLDI '93.