SUCA: a scalable unicore architecture with novel instruction encoding and distributed execution control

The scalability achieved by partitioning of resources among clusters is still limited by traditional instruction encoding and centralized instruction execution control. This paper introduces a scalable unicore clustered architecture (SUCA). The instruction encoding encodes common information of sequences of instructions separately, thus reducing the amount of information in instruction words. The pipeline allows functional units to manage their own execution, thus releasing instruction issuing from instruction scheduling. SUCA can scale to 32 clusters with 1024 registers. Meanwhile, for the 4-cluster configuration, SUCA achieves an average of 13.3% speedup and a 4.6% improvement in frequency with reasonable hardware overhead, as compared with a traditional clustered processor.

[1]  Jason Cong,et al.  Simultaneous resource binding and interconnection optimization based on a distributed register-file microarchitecture , 2009, TODE.

[2]  David R. Kaeli,et al.  Heterogeneous Clustered VLIW Microarchitectures , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[3]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[4]  Doug Burger,et al.  Exploiting criticality to reduce bottlenecks in distributed uniprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[5]  William J. Dally,et al.  Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[6]  Scott A. Mahlke,et al.  A distributed control path architecture for VLIW processors , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).