Does multicast communication make sense in write invalidation traffic?

In distributed shared memory (DSM) multiprocessors, a write operation requires multiple messages to invalidate the nodes which share and cache the memory block to be written. The resulting write stall time is a performance hurdle to such systems. One approach to efficient invalidation is to use multicast messages to reach the sharing nodes. We use application driven simulation to evaluate two multicast based invalidation schemes: dual path (X. Lin and L.M. Ni, 1993) and pruning (M.P. Malumbres et al., 1996). Based on our experimental settings, we found that multicast improves invalidation traffic for four of the six evaluated real applications. The remaining two programs are computation intensive, and multicast based validation is less effective. But since they induce bursty communication, we found that multicasts help to relieve the network congestion during those periods of time. Dual path performs a little better than pruning, because it is less sensitive to routing delay in the routers. We also found that cache size is an important design parameter for multicast based invalidation. It is more effective for DSM multiprocessors with large caches.

[1]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[2]  Chung-Ta King,et al.  Boosting the performance of NOW-based shared memory multiprocessors through directory hints , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[3]  Xiaola Lin,et al.  Multicast Communication in Multicomputer Networks , 1993, ICPP.

[4]  Wen-Tsuen Chen,et al.  Multiple traffic scheduling for enhanced General Packet Radio Service , 2001, IEEE 54th Vehicular Technology Conference. VTC Fall 2001. Proceedings (Cat. No.01CH37211).

[5]  David R. O'Hallaron,et al.  Earthquake ground motion modeling on parallel computers , 1996, Supercomputing '96.

[6]  Chung-Ta King,et al.  Modeling and evaluating peer-to-peer storage architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[7]  Chung-Ta King,et al.  Implementation and evaluation of directory hints in CC-NUMA multiprocessors , 2002, Parallel Comput..

[8]  Wen-Tsuen Chen,et al.  A novel code assignment scheme for W-CDMA systems , 2001, IEEE 54th Vehicular Technology Conference. VTC Fall 2001. Proceedings (Cat. No.01CH37211).

[9]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[10]  Wen-Tsuen Chen,et al.  Enhancing CRTP by retransmission for wireless networks , 2001, Proceedings Tenth International Conference on Computer Communications and Networks (Cat. No.01EX495).

[11]  Chung-Ta King,et al.  MICA: a memory and interconnect simulation environment for cache-based architectures , 2000, Proceedings 33rd Annual Simulation Symposium (SS 2000).

[12]  Chung-Ta King,et al.  The thread-based protocol engines for CC-NUMA multiprocessors , 2000, Proceedings 2000 International Conference on Parallel Processing.

[13]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[14]  Chung-Ta King,et al.  Tailoring a DSM simulation environment for edge cache architecture , 2001 .

[15]  Chung-Ta King,et al.  Neuron-a wide-area service discovery infrastructure , 2002, Proceedings International Conference on Parallel Processing.

[16]  Chung-Ta King,et al.  A Simulation Toolkit for x86-Compatible Processors - XSim , 1999, Int. J. High Speed Comput..

[17]  Josep Torrellas,et al.  An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[18]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.