Memory organization for improved data cache performance in embedded processors

Code generation for embedded processors creates opportunities for several performance optimizations not applicable for traditional compilers. We present techniques for improving data cache performance by organizing variables declared in embedded code into memory, using specific parameters of the data cache. Our approach clusters variables to minimize compulsory cache misses, and solves the memory assignment problem to minimize conflict cache misses. Our experiments demonstrate significant improvement in data cache performance (average 46% in hit ratios) by the application of our memory organization technique using code kernels from DSP and other domains on the LSI Logic CW4001 embedded processor.

[1]  K. Au,et al.  MiniRISC CW4001-a small, low-power MIPS CPU core , 1995, Proceedings of the IEEE 1995 Custom Integrated Circuits Conference.

[2]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[3]  Nikil D. Dutt,et al.  Reducing address bus transition for low power memory mapping , 1996, Proceedings ED&TC European Design and Test Conference.

[4]  Norman P. Jouppi Cache write policies and performance , 1993, ISCA '93.

[5]  Joos Vandewalle,et al.  An efficient microcode compiler for application specific DSP processors , 1990, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Kurt Keutzer,et al.  Storage assignment to decrease code size , 1996, TOPL.

[7]  Pierre G. Paulin,et al.  Flexware: A flexible firmware development environment for embedded systems , 1994, Code Generation for Embedded Processors.

[8]  Gert Goossens,et al.  Code Generation for Embedded Processors , 1995 .

[9]  Yoji Yamada,et al.  Reducing Cache Misses in Numerical Applications Using Data Relocation and Prefetching. , 1995 .

[10]  Monica S. Lam,et al.  A data locality optimizing algorithm (with retrospective) , 1991 .

[11]  Hiroyuki Tomiyama,et al.  Optimal code placement of embedded software for instruction caches , 1996, Proceedings ED&TC European Design and Test Conference.

[12]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[13]  Sharad Malik,et al.  Memory bank and register allocation in software synthesis for ASIPs , 1995, Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[14]  Dennis Gannon,et al.  Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..

[15]  Sharad Malik,et al.  Performance estimation of embedded software with instruction cache modeling , 1995, Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[16]  T. C. May,et al.  Instruction-set matching and selection for DSP and ASIP code generation , 1994, Proceedings of European Design and Test Conference EDAC-ETC-EUROASIC.

[17]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.