Architectural support for cilk computations on many-core architectures

Future generations of high performance processors have the potential to integrate tens to hundreds of processing cores in a single chip. In recent years, many-core architectures [1] have been proposed as a promising platform to exploit massive parallelism. Although previous works have demonstrated encouraging performance potential, there is still limited consensus on how to program many-core architectures. In this work, we propose architectural support for Cilk computations [3, 4] on Godson-T V3 architecture [2]. Our design has two hardware components. One component includes necessary cache control mechanisms to support ScC based multi-threaded programming. We choose ScC model in order to design more scalable cache consistency protocols. In addition, we propose coherence vector, a hardware structure and related programming interfaces to further improve the programmability of ScC. With this architectural support, we port the sequential consistency based multi-threaded Cilk runtime system (Cilk-5.4.6) smoothly onto our many core architecture model, which implements ScC. The other component is the architectural support for DAG consistency. This is important to relieve application programmers from worrying about cache consistency issues themselves. We make two contributions: (1) we propose a set of architecture mechanisms to support Cilk programming model. We show that it is possible to achieve a balance between two conflicting goals: programmability and scalable cache consistency protocols, which is important for MCA. (2) Experimental results reveal two fundamental reasons which limit the performance scalability of MCA: the unbalanced on chip network bandwidth usage and limited memory bandwidth. 2. Architectural Support for Programmability