A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications

The ever-increasing gap between processor and memory speed is an issue also in embedded systems, because of the increased complexity of multimedia elaborations and the strict resource constraints of these devices.Profile-driven code optimization techniques can be effectively employed for tuning application-cache interaction and performances of cache system itself. In fact, applications running on such systems are usually known in advance and do not change over time. In a previous paper, we presented a profile-based code restructuring technique (CAT) that was able to dramatically increase cache exploitation of embedded applications.However, it is well known that profile-driven optimizations can suffer from input-sensitivity problems: an application that is optimized for a particular input can perform even worse than the original one, when subjected other inputs.In this paper we take into account jpeg and mpeg compressor/decompressor applications and analyze the input-sensitivity of CAT improved layouts over a wide range of inputs. The input sets were accurately determined through both black-box and white-box analysis of applications.We propose two metrics for measuring the input-sensitivity of application layouts, and show how our profile-driven code transformation technique is able to reduce the input-sensitivity of the considered applications up to 48% on caches ranging from 1 KByte to 8KByte.

[1]  Mahmut T. Kandemir,et al.  Improving Cache Locality by a Combination of Loop and Data Transformation , 1999, IEEE Trans. Computers.

[2]  David R. Kaeli,et al.  Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance , 1999, IEEE Trans. Computers.

[3]  Eric Rotenberg,et al.  A Trace Cache Microarchitecture and Evaluation , 1999, IEEE Trans. Computers.

[4]  M. Milenkovic,et al.  A performance evaluation of memory hierarchy in embedded systems , 2003, Proceedings of the 35th Southeastern Symposium on System Theory, 2003..

[5]  Scott McFarling,et al.  Procedure merging with instruction caches , 1991, PLDI '91.

[6]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[7]  Hiroshi Nakamura,et al.  Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.

[8]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[9]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[10]  Josep Torrellas,et al.  Optimizing the Instruction Cache Performance of the Operating System , 1998, IEEE Trans. Computers.

[11]  Jack W. Davidson,et al.  Profile guided code positioning , 1990, SIGP.

[12]  Cosimo Antonio Prete,et al.  A cache-aware program transformation technique suitable for embedded systems , 2002, Inf. Softw. Technol..

[13]  V. Milutinovic,et al.  Enhancing and Exploiting the Locality , 1999, IEEE Trans. Computers.

[14]  Antonio González,et al.  Randomized Cache Placement for Eliminating Conflicts , 1999, IEEE Trans. Computers.

[15]  Pierre G. Paulin,et al.  System-on-chip beyond the nanometer wall , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[16]  Mateo Valero,et al.  Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.

[17]  Brad Calder,et al.  Procedure placement using temporal ordering information , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[18]  Cosimo Antonio Prete,et al.  The ChARM tool for tuning embedded systems , 1997, IEEE Micro.

[19]  Antonio González,et al.  A locality sensitive multi-module cache with explicit management , 1999, ICS '99.