Code Optimization Techniques of Data-Intensive Tasks onto Statically Scheduled Architectures: Optimal Performance on the TigerSharc

This paper considers code optimization using the novel TS1xx processor from Analog Devices. Very large instruction word architectures (VLIW), such as the TS1xx represent the state of the art in high-performance signal processing. The theoretically achievable peak performance of VLIW processors increases steadily with the use of on-chip parallelism. It is demonstrated that C compiler technology cannot achieve peak computing rates on a statically scheduled processor and the applications programmer must rely on hand optimized Assembler Libraries. This necessitates intimate knowledge of the specific compiler optimization techniques, as well as the underlying hardware. Compiler friendly code optimized by the VisualC2.0 compiler, is compared against hand optimized Assembler code for a common operation involving a loop with multiple memory accesses, floating point arithmetic and pointer operations. It is found that mature C code for matrix vector multiplication executes in roughly 1.18*n*m cycles, whereas the same operation optimized in assembler has a cycle complexity of 0.5*n(m+16) - a measurable performance improvement.

[1]  Péter Kacsuk,et al.  Advanced Computer Architectures , 1997 .

[2]  Jian Wang,et al.  Source-level loop optimization for DSP code generation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Paolo Faraboschi,et al.  The latest word in digital and media processing , 1998 .