论文信息 - Speculative Multithreading Architectures

Speculative Multithreading Architectures

With the conventional superscalar approach delivering diminishing returns, alternate designs that make optimal use of the increasing chip densities are actively being explored. Unlike the conventional superscalar that expends all its resources in exploiting instruction-level parallelism (ILP), there has been much emphasis on architectures that can also exploit thread-level parallelism from the application. Chip-multiprocessor architectures (CMP) are one promising approach in this direction that can also better exploit the increasing transistor count on a chip. In these architectures, speculation may be employed to execute applications that are either sequential or cannot be parallelized efficiently. Unfortunately, current approaches either use limited hardware support that permit a restricted communication mechanism between processors, which can occur only through memory, or use a large amount of speculation hardware that would remain unutilized when running a explicitly parallel program or a multiprogrammed workload. In our thesis, we show that wide-issue dynamic processors that will soon populate CMPs would make fast communication at the register level a requirement for high performance. Consequently, we propose an effective but quite modest hardware that supports communication and synchronization of registers between on-chip processors. Furthermore, we propose hardware support that handles true memory dependence violations when the application is run in a speculative execution mode. We also present the compiler support that enables automatic identification of threads from sequential binaries. We show how the software-hardware approach enables effective speculative execution of a sequential binary on a CMP architecture without source re-compilation. Overall, we augment the CMP with just enough support, while still maintaining the generic CMP architecture to a reasonable degree. Given that the amount of thread- and instruction-level parallelism of applications vary widely, the traditional CMP approach of statically partitioning the chip resources between threads may lead to wasted resources when one of the threads stalls due to hazards or when the application lacks threads. The Simultaneous Multithreading (SMT) architecture addresses this problem by allowing complete flexibility in resource sharing. Unfortunately, this approach like the conventional superscalar, is so centralized that it may not be a feasible architecture. In our thesis, we explore a hybrid approach, namely the clustered SMT architecture. We show that this restricted level of simultaneous multithreading is able to capture most of the performance benefits of the fully centralized approach while, at the same time, allowing the design to be decentralized. The final part of our thesis focuses on simulation methodology. Multiprocessor system evaluation has traditionally been based on direct-execution based Execution-Driven Simulations (EDS). In such environments, the proc

Venkata Krishnan