Simple offset assignment in presence of subword data

Many embedded architectures support indirect addressing mode with autoincrement/autodecrement. By maximizing the use of this mode, generation of explicit instructions for performing address arithmetic can be avoided and thus reductions in code size and improvements in performance are achieved. Bartley [2] and Liao et al. [16] developed a method for finding a storage layout for program variables so that the use of autoincrement/autodecrement could be maximized. They introduced the Simple Offset Assignment (SOA) problem and solved it using a Path Cover (PC) formulation.We observe that many media and network processing applications make extensive use of subword data. Therefore, for such applications, by packing multiple subword variables into a single word, we can generate storage layouts that further reduce the cost of address arithmetic in two ways. First the need for address arithmetic is reduced as variables that are packed together share the same address. Second opportunities for using autoincrement andautodecrement instructions are increased as layouts are now possible which place a variable adjacent to more than two variables. This approach has become feasible because of the recent trend in embedded processor design which allows subword variables that are packed together to be accessed and manipulated without incurring performance penalty. We introduce the SubWord Offset Assignment (SWOA) problem and solve it using a Path Cover with Node Coalescing (PCwNC) formulation. Node coalescing corresponds to packing of multiple subword variables into a single word while path covering corresponds to placement of variables in adjacent memory locations to enable the use of autoincrement/autodecrement. We present three heuristics to solve the PCwNC problem. Experiments show that when the program is optimized for code size, the three proposed algorithms achieve 26%, 26.9% and 32% reduction in the number of static explicit address arithmetic instructions over Liao et al.'s algorithm. The algorithms also achieve 14.5%, 22.1%and 22.7% reduction in stack frame size. If the program is optimized for performance, the algorithms achieve 24.3%, 24.7% and 30.2% reduction in the dynamic instruction count of explicit address arithmetic instructions.

[1]  Rainer Leupers,et al.  A uniform optimization technique for offset assignment problems , 1998, Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210).

[2]  Mark Stephenson,et al.  Bidwidth analysis with application to silicon compilation , 2000, PLDI '00.

[3]  Rajiv Gupta,et al.  Bit section instruction set extension of ARM for embedded applications , 2002, CASES '02.

[4]  S. Devadas,et al.  Analysis And Evaluation of Address Arithmetic Capabilities in Custom DSP Archtectures , 1997, Proceedings of the 34th Design Automation Conference.

[5]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  Tilman Wolf,et al.  CommBench-a telecommunications benchmark for network processors , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[7]  D. H. Bartley,et al.  Optimizing stack frame accesses for processors with restricted addressing modes , 1992, Softw. Pract. Exp..

[8]  Rajiv Gupta,et al.  Bitwidth aware global register allocation , 2003, POPL '03.

[9]  Gerhard Fettweis,et al.  A new network processor architecture for high-speed communications , 1999, 1999 IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No.99TH8461).

[10]  Rajiv Gupta,et al.  Profile guided selection of ARM and thumb instructions , 2002, LCTES/SCOPES '02.

[11]  Rajiv Gupta,et al.  Enhancing the performance of 16-bit code using augmenting instructions , 2003 .

[12]  Rajiv Gupta,et al.  Enhancing the performance of 16-bit code using augmenting instructions , 2003, LCTES.

[13]  Kurt Keutzer,et al.  Storage assignment to decrease code size , 1996, TOPL.

[14]  Rajiv Gupta,et al.  INSTRUCTION SETS MIXED-WIDTH , 2003 .

[15]  Santosh Pande,et al.  Storage assignment optimizations through variable coalescence for embedded processors , 2003 .

[16]  Rajiv Gupta,et al.  A Representation for Bit Section Based Analysis and Optimization , 2002, CC.

[17]  Taewhan Kim,et al.  Address assignment combined with scheduling in DSP code generation , 2002, DAC '02.

[18]  Christopher W. Fraser,et al.  Analyzing and compressing assembly code , 1984, SIGPLAN '84.

[19]  Gilbert Wolrich,et al.  The next generation of Intel IXP network processors , 2002 .

[20]  Amit Rao,et al.  Storage assignment optimizations to generate compact and efficient code on embedded DSPs , 1999, PLDI '99.

[21]  Srinivas Devadas,et al.  Analysis and Evaluation of Address Arithmetic Capabilities in Custom DSP Architectures , 1997, Des. Autom. Embed. Syst..

[22]  Rajiv Gupta,et al.  Bitwidth aware global register allocation , 2003, POPL.

[23]  Mahmut T. Kandemir,et al.  Address Register Assignment for Reducing Code Size , 2003, CC.

[24]  Rainer Leupers,et al.  C Compiler Design for an Industrial Network Processor , 2001, OM '01.

[25]  Rainer Leupers,et al.  Algorithms for address assignment in DSP code generation , 1996, ICCAD 1996.

[26]  Wendong Hu,et al.  NetBench: a benchmarking suite for network processors , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[27]  Chaitali Chakrabarti,et al.  Address code generation for digital signal processors , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[28]  Rajiv Gupta,et al.  Code Compaction of Matching Single-Entry Multiple-Exit Regions , 2003, SAS.

[29]  Rajiv Gupta,et al.  Mixed-width instruction sets , 2003, CACM.