Changing with the times: adaptive interconnects and coherence for future chip multiprocessors

Instead of scaling up the frequency of a single core to increase performance, chip multiprocessors (CMPs) have emerged as the practical alternative to scale performance by leveraging parallelism as the means to meet the increasing demands of applications. As chip multiprocessors continue to scale to larger numbers of processing cores, the demands on the on-chip communication framework will grow to satisfy the data and communication requirements of future multithreaded applications. This problem is exacerbated by poor wire scaling, which increases the latency and power consumption of on-chip communication. In response, two alternative interconnects have emerged, both based on electromagnetic wave propagation and both with latency effectively limited by the speed of light: optical interconnect (OI) and RF interconnect (RF-I). In the first part of this dissertation, we focus on the use of alternative interconnects in future many-core systems to provide performance and power benefit by reducing on-chip access latency. In most conventional NoCs, link bandwidths are allocated in a uniform way in order to provide sufficient bandwidth for varying traffic demands. By studying the communication demands in different applications, we observed that applications tend to exhibit diverse patterns of communication. We demonstrate the use of RF-I to adapt to these varying communication patterns by flexibly allocating RF-I bandwidth to the critical paths of communication. By allocating RF-I bandwidth between components that communicate frequently and using lower bandwidth in other parts of the NoC, we can provide NoC power savings without significant loss in performance. In order to leverage the abundant processing resources available on-chip, future many-core systems will require an effective means of sharing data between the collaborati cores. Hence, a power-efficient, scalable, and coherent interconnect fabric is vital to scale application performance in the many-core era. We propose a scalable architecture to enable snooping-based coherence, by introducing a low-latency interconnect structure specialized for store traffic in addition to the regular baseline NoC for all other traffic. We see a need to separate store requests from the rest of the on-chip traffic to avoid the impact of stores on load latency and bandwidth. We demonstrate the performance and power advantage of our snooping-based cache coherence architecture. As part of this dissertation, we also try to study the scalability of the two emerging alternative interconnect technologies, by providing a quantitative comparison of both OI and RF-I at the same technology generation. Ultimately, we will demonstrate where OI and RF-I will most likely be used for future designs. Our analysis will include on-chip communication, and chip-to-chip communication.