Instruction scheduling for clustered VLIW DSPs

Recent digital signal processors (DSPs) show a homogeneous VLTW-like data path architecture, which allows C compilers to generate efficient code. However, still some special restrictions have to be obeyed in code generation for VLIW DSPs. In order to reduce the number of register file ports needed to provide data for multiple functional units working in parallel, the DSP data path may be clustered into several sub-paths, with very limited capabilities of exchanging values between the different clusters. An example is the well-known Texas Instruments C6201 DSP. For such an architecture, the tasks of scheduling and partitioning instructions between the clusters are highly interdependent. This paper presents a new instruction scheduling approach, which in contrast to earlier work, integrates partitioning and scheduling into a single technique, so as to achieve a high code quality. We show experimentally that the proposed technique is capable of generating more efficient code than a commercial code generator for the TI C6201.

[1]  B. Ramakrishna Rau,et al.  Machine-Description Driven Compilers for EPIC and VLIW Processors , 1999, Des. Autom. Embed. Syst..

[2]  F. Jesús Sánchez Navarro,et al.  Instruction scheduling for clustered VLIW architectures , 2000 .

[3]  室 章治郎 Michael R.Garey/David S.Johnson 著, "COMPUTERS AND INTRACTABILITY A guide to the Theory of NP-Completeness", FREEMAN, A5判変形判, 338+xii, \5,217, 1979 , 1980 .

[4]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  Ronald V. Book,et al.  Review: Michael R. Garey and David S. Johnson, Computers and intractability: A guide to the theory of $NP$-completeness , 1980 .

[7]  Thomas M. Conte,et al.  Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[8]  Joseph A. Fisher,et al.  Clustered Instruction-Level Parallel Processors , 1998 .

[9]  Preston Briggs,et al.  Register allocation via graph coloring , 1992 .

[10]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[11]  Bruce D. Shriver,et al.  Some Experiments in Local Microcode Compaction for Horizontal Machines , 1981, IEEE Transactions on Computers.

[12]  Gustavo de Veciana,et al.  Lower bound on latency for VLIW ASIP datapaths , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[13]  Nikil D. Dutt,et al.  Partitioning of Variables for Multiple-Register-File Architectures via Hypergraph Coloring , 1994, IFIP PACT.

[14]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[15]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[16]  Alexander Aiken,et al.  A Development Environment for Horizontal Microcode , 1986, IEEE Trans. Software Eng..