论文信息 - The thread-based protocol engines for CC-NUMA multiprocessors

The thread-based protocol engines for CC-NUMA multiprocessors

With the vast advances of Internet services, large-scale and high-performance servers, such as CC-NUMA multiprocessors, are gaining importance in network computing. In a CC-NUMA multiprocessor, the key component to connect a computing node to the interconnection network is the node controller. Node controllers perform protocol processing to transmit messages with other nodes in the system. As the new generation CC-NUMA multiprocessors are moving towards application-specific protocol processing, a node controller will require very powerful protocol processors or engines to provide the flexibility of processing different kinds of protocols. In this paper, we study the design of a thread-based node controller, in which protocol engines have a multithreaded architecture. Multithreading allows protocol processing of different requests to proceed in parallel, whereby reducing blocking and improving response time. Four important design parameters for a multithreaded protocol engine are examined: (1) the number of thread context storages, (2) the number of protocol operation units, (3) the scheduling policy and (4) the thread allocation scheme. From the application-driven simulation on six representative applications, we conclude that the number of thread contexts and protocol operation units have a great impact on the overall system performance. An appropriate thread allocation scheme for invalidation traffic is needed, and prioritizing a thread and scheduling it accordingly are also important for the system performance.

Chung-Ta King | Hung-Chang Hsiao | C. King | Hung-Chang Hsiao

[1] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .

[2] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[3] Chung-Ta King,et al. MICA: a memory and interconnect simulation environment for cache-based architectures , 2000, Proceedings 33rd Annual Simulation Symposium (SS 2000).

[4] B J Smith,et al. A pipelined, shared resource MIMD computer , 1986 .

[5] James R. Goodman,et al. Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.

[6] Michael C. Browne,et al. S-Connect: from networks of workstations to supercomputer performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[7] Michael C. Browne,et al. The S3.mp scalable shared memory multiprocessor , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[8] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.

[9] Anoop Gupta,et al. Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[10] Anoop Gupta,et al. Comparative performance evaluation of cache-coherent NUMA and COMA architectures , 1992, ISCA '92.

[11] Chung-Ta King,et al. Does multicast communication make sense in write invalidation traffic? , 2000, Proceedings Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568).

[12] Tom Lovett,et al. STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[13] Anoop Gupta,et al. Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors , 1998, ISCA.

[14] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[15] D.A. Wood,et al. Reactive NUMA: A Design For Unifying S-COMA And CC-NUMA , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[16] T. Lovett,et al. STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[17] Herb Schwetman,et al. Using CSIM to model complex systems , 1988, 1988 Winter Simulation Conference Proceedings.

[18] Maged M. Michael,et al. Coherence controller architectures for SMP-based CC-NUMA multiprocessors , 1997, ISCA '97.

[19] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[20] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[21] Michael L. Scott,et al. Contention-free combining tree barriers , 1994 .

[22] Todd M. Austin,et al. Zero-cycle loads: microarchitecture support for reducing load latency , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[23] Vicki H. Allan,et al. Petri net versus module scheduling for software pipelining , 1995, MICRO 1995.

[24] Chung-Ta King,et al. A Simulation Toolkit for x86-Compatible Processors - XSim , 1999, Int. J. High Speed Comput..

[25] Mark D. Hill,et al. Multiprocessors Should Support Simple Memory-Consistency Models , 1998, Computer.

[26] Donald Yeung,et al. The MIT Alewife machine: architecture and performance , 1995, ISCA '98.

[27] Carla Schlatter Ellis,et al. Experimental comparison of memory management policies for NUMA multiprocessors , 1991, TOCS.

[28] Chung-Ta King,et al. Boosting the performance of NOW-based shared memory multiprocessors through directory hints , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[29] David A. Wood,et al. Decoupled Hardware Support for Distributed Shared Memory , 1996, ISCA.

[30] Anoop Gupta,et al. The performance impact of flexibility in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[31] David R. O'Hallaron,et al. Earthquake ground motion modeling on parallel computers , 1996, Supercomputing '96.

[32] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[33] Sudhakar Yalamanchili,et al. Interconnection Networks: An Engineering Approach , 2002 .