Modulo scheduling for the TMS320C6x VLIW DSP architecture

Digital Signal Processing (DSP) architectures are specialized for high performance numerical algorithms such as those found in communication and multimedia applications. The development of efficient compilers for DSP processors is a growing research area. The Texas Instruments TMS320C6x (C6x) is a Very Long Instruction Word (VLIW) DSP architecture capable of issuing eight operations in parallel. In this paper, we present the results of implementing a software pipelining algorithm for the C6x. We provide a description of the C6x and detail the architectural features that impact software pipelining such as a moderately sized register file, constraints on code size, homogeneous resources, and multiple assignment code. We discuss how we adapted modulo scheduling to implement software pipelining for the C6x. Finally, we present the results of modulo scheduling a set of 40 loop kernel benchmarks and measure the algorithm in terms of schedule quality and algorithm complexity.

[1]  Janak H. Patel,et al.  Improving the throughput of a pipeline by insertion of delays , 1998, ISCA '98.

[2]  Ernst L. Leiss Parallel and vector computing: a practical introduction , 1995 .

[3]  Alan E. Charlesworth,et al.  An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family , 1981, Computer.

[4]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[5]  Thomas M. Conte,et al.  Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[6]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[7]  Alexandru Nicolau,et al.  Realistic scheduling: compaction for pipelined architectures , 1990, MICRO.

[8]  Guang R. Gao,et al.  Software pipelining showdown: optimal vs. heuristic methods in a production compiler , 1996, PLDI '96.

[9]  Toshio Nakatani,et al.  A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture , 1990 .

[10]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[11]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[12]  Janak H. Patel,et al.  Improving the Throughput of a Pipeline by Insertion of Delays , 1976, ISCA.

[13]  Paolo Faraboschi,et al.  The latest word in digital and media processing , 1998 .

[14]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[15]  Wen-mei W. Hwu,et al.  The benefit of predicated execution for software pipelining , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[16]  Krishna Subramanian,et al.  Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 25.

[17]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.