Spatial computation on a homogeneous, many-core architecture

With abundant transistors but limited energy budgets, chip designs have trended towards multiple cores and specialised logic which is used infrequently. This approach allows computer architects to sidestep the utilisation wall: the idea that we can place more transistors on a chip than we can use simultaneously. This suggests that transistors’ functions should be specialised, so only a small fraction of the chip need be active at a time. However, as trade-offs continue to change, this approach will become less effective. Increasing heterogeneity increases complexity, and this makes it harder to validate the chip’s design; harder to generate optimised code; and harder to protect against hardware faults. Furthermore, beyond 28nm, we can no longer assume that smaller transistors will always be cheaper, so we cannot continue to provide dedicated logic which will be used infrequently. Instead, we propose switching to a homogeneous approach, and implementing the necessary specialisation in software. Having a single computation unit which is repeated many times reduces complexity and so makes the problems of validation, compilation and fault tolerance easier to solve. Homogeneous systems have the additional advantage that they are general-purpose, so a wider range of applications can be usefully accelerated. The challenge then becomes: how do we make use of all the available processors? A thread-based approach will only get us so far. Thread-level parallelism (TLP) is only abundant in a small fraction of code, and TLP in general applications has remained stubbornly low [3]. Instead, we show that if communication between cores is low-latency and low-energy, large numbers of them can be grouped together at run-time to implement a virtual architecture optimised for a particular application. This virtual architecture can be given the ideal cache capacity, communication structure and number of functional units to execute a task efficiently. Since the underlying architecture is homogeneous, there is also scope for dynamically varying the resources allocated, depending on circumstances such as contention, priority and power budget.