Design and implementation of the NUMAchine multiprocessor

This paper describes the design and implementation of the NUMAchine multiprocessor. As the market for CC-NUMA multiprocessors expands, this research project provides a timely architectural design and cost-effective prototype. The key to the successful implementation of our 48-processor prototype is the use of off-the-shelf components and programmable logic devices. Since this machine will serve as a research vehicle for parallel software development, a number of hardware features to enhance experimentation have been included in the design.

[1]  Zeljko Zilic,et al.  Experience in designing a large-scale multiprocessor using field-programmable devices and advanced CAD tools , 1996, DAC '96.

[2]  Michael Stumm,et al.  A performance comparison of hierarchical ring- and mesh-connected multiprocessor networks , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[3]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[4]  Donald Yeung,et al.  The MIT Alewife machine: architecture and performance , 1995, ISCA '98.

[5]  Zeljko Zilic,et al.  Designing for High Speed-Performance in CPLDs and FPGAs , 1998 .

[6]  Guy Lemieux,et al.  The NUMAchine multiprocessor , 2000, Proceedings 2000 International Conference on Parallel Processing.

[7]  Anoop Gupta,et al.  The DASH prototype: implementation and performance , 1992, ISCA '92.