Today's DSP processors are so complex, it has become impossible to program them using assembly. To get the maximum performance out of the applications running on such devices very good compilers are needed. This paper analyzes the capabilities of those compilers by optimizing a compute-intensive 3D-image reconstruction algorithm on the TMS320C6701 ('C67) DSP processor from Texas Instruments. Because the 'C67 is a VLIW processor, performance depends on the ability of the compiler to detect parallelism. By rewriting the C source code, we made it clear to the compiler which code was not data dependent, and thus could be executed in parallel. Over all optimizations the average instructions per cycle rose from 0.41 to 2.61 (/spl times/6) and the number of instructions to be executed was divided by 3.6. The net result was a performance increase of 2200%. For every discussed optimization step we state the problem that prevented efficient code generation by the compiler and say how we overcame this problem. We show that for a lot of the steps the performance problem was caused by a lack of provisions to efficiently communicate between the user and compiler. We had to trick the compiler in doing the optimizations we wanted by writing the program the right way. This was a long and tedious process. Therefore, we look at what provisions should be added to improve communication and reduce development time and time to market.
[1]
Junqiang Sun,et al.
Tms320c6000 cpu and instruction set reference guide
,
2000
.
[2]
Brian W. Kernighan,et al.
The C Programming Language, Second Edition
,
1988
.
[3]
Dennis M. Ritchie,et al.
The C programming language - ANSI C - Second edition
,
1988
.
[4]
Luc Van Gool,et al.
One-shot active 3D shape acquisition
,
1996,
Proceedings of 13th International Conference on Pattern Recognition.
[5]
Vicki H. Allan,et al.
Software pipelining
,
1995,
CSUR.
[6]
Luc Van Gool,et al.
Active acquisition of 3D shape for moving objects
,
1996,
Proceedings of 3rd IEEE International Conference on Image Processing.