Home-based software DSMs provide a simple, eeective, and scalable way to build software DSMs. However, the performance of home-based software DSMs is sensitive to the distribution of home pages. This paper introduces our work on migrating home pages adaptively according to the application sharing pattern in a home-based software DSM system called JIAJIA. In the scheme, pages that are written by only one processor between two barriers are migrated to the single writing processor. Migration messages are pig-gybacked on barrier messages and no additional communication is required for the migration. Though very simple, performance evaluation with SPLASH program suite and NAS Parallel Benchmarks shows that home migration can reduce diis dramatically and performance gains obtained by home migration arranges from several to hundreds percent compared to statically distributing home of shared data page-by-page across processors. 1 Introduction As software DSMs continue to strive against false sharing and communication latency to deliver better performance, their complexity increases steadily. Traditional well accepted software DSM protocols such as lazy release consistency (LRC))9] minimizes false sharing and messages through delaying the propagation of page invalidation until the latest possible acquire time, but introduces substantial memory and coherence-related overhead. In TreadMarkss10] which implements LRC, each processor keeps memory-consuming diis locally until garbage collection which is expensive in CPU cycles. Besides, with the dii distribution scheme, a faulting processor has to obtain diis from all writers of the fault page, and the same dii may need to be applied many times as diierent processors fetch the same dii. Memory and coherence-related complexity also limits the scalability of the LRC protocol.
[1]
Anoop Gupta,et al.
The SPLASH-2 programs: characterization and methodological considerations
,
1995,
ISCA.
[2]
Anoop Gupta,et al.
Memory consistency and event ordering in scalable shared-memory multiprocessors
,
1990,
[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[3]
J. Ott,et al.
Strategies for multilocus linkage analysis in humans.
,
1984,
Proceedings of the National Academy of Sciences of the United States of America.
[4]
David H. Bailey,et al.
The Nas Parallel Benchmarks
,
1991,
Int. J. High Perform. Comput. Appl..
[5]
Willy Zwaenepoel,et al.
Implementation and performance of Munin
,
1991,
SOSP '91.
[6]
Weisong Shi,et al.
Reducing system overheads in home-based software DSMs
,
1999,
Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.
[7]
Liviu Iftode,et al.
Home-based shared virtual memory
,
1998
.