Data dependence graph directed scheduling for clustered VLIW architectures

This paper presents an instruction scheduling and cluster assignment approach for clustered very long instruction words (VLIW) processors. The technique produces high performance code by simultaneously balancing instructions among clusters and minimizing the amount of inter-cluster data communications. The scheme is evaluated based on benchmarks extracted from UTDSP. Results show a significant speedup compared with previously used techniques with speed-ups of up to 44%, with average speed-ups ranging from 14% (2-cluster) to 18% (4-cluster).

[1]  Z. Greenfield,et al.  The TigerSHARC DSP Architecture , 2000, IEEE Micro.

[2]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[3]  Scott A. Mahlke,et al.  Region-based hierarchical operation partitioning for multicluster processors , 2003, PLDI '03.

[4]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  A. Gonzalez,et al.  Graph-partitioning based instruction scheduling for clustered processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[6]  Zhixiong Zhou,et al.  A 2-Dimension Force-Directed Scheduling Algorithm for Register-File-Connectivity Clustered VLIW Architecture , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[7]  Geoffrey Brown,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, ISCA '00.

[8]  Joseph A. Fisher,et al.  Clustered Instruction-Level Parallel Processors , 1998 .

[9]  Thomas M. Conte,et al.  Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[10]  Mark Stephenson,et al.  Convergent scheduling , 2002, MICRO 35.

[11]  Kemal Ebcioglu,et al.  CARS: a new code generation framework for clustered ILP processors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[12]  Scott A. Mahlke,et al.  Compiler-directed data partitioning for muiticluster processors , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[13]  J. M. Codina,et al.  Virtual Cluster Scheduling Through the Scheduling Graph , 2007, International Symposium on Code Generation and Optimization (CGO'07).