Design of Non-Critical Path Resource Distributor for SMT Processors

The performance in simultaneous multithreading (SMT) processors is mainly determined by the distribution of the common resources among the threads. However, resource distribution methods often take cycles to calculate a resource allocation solution. Implementing a many-cycles resource distributor in the critical pipeline path will result in an innegligible impact on SMT processor performance. This work proposes a design of non-critical path resource distributor (NCPRD) for SMT processors, which separates the resource distribution from the critical pipeline path to avoid the clock wastage caused by the computation on the allocation solution. Our limit case study shows that, NCPRD benefits from its asynchronous work mode under both throughput and fairness metric in all type workloads but the memory-intensive workloads, and obtains more gains over the critical-path-involved resource distributor when the computation on the allocation solution spends more cycles.

[1]  Seong-Won Lee,et al.  Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread , 2006 .

[2]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[3]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[4]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[5]  Dean M. Tullsen,et al.  Software-Directed Register Deallocation for Simultaneous Multithreaded Processors , 1999, IEEE Trans. Parallel Distributed Syst..

[6]  David H. Albonesi,et al.  Front-end policies for improved issue efficiency in SMT processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[7]  Dean M. Tullsen,et al.  Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.

[8]  Joseph J. Sharkey,et al.  Adaptive reorder buffers for SMT processors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[10]  D. Yeung Learning-Based SMT Processor Resource Distribution via Hill-Climbing , 2006, ISCA 2006.

[11]  Dean M. Tullsen,et al.  Handling long-latency loads in a simultaneous multithreading processor , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[12]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.