Performance modeling of parallel applications on MPSoCs

In this paper we present a new technique for automatically measuring the performance of tasks, functions or arbitrary parts of a program on a multiprocessor embedded system. The technique instruments the tasks described by OpenMP, used to represent the task parallelism, while ad hoc pragmas in the source indicate other pieces of code to profile. The annotations and the instrumentation are completely target-independent, so the same code can be measured on different target architectures, on simulators or on prototypes. We validate the approach on a single and on a dual LEON 3 platform synthesized on FPGA, demonstrating a low instrumentation overhead. We show how the information obtained with this technique can be easily exploited in a Hardware/Software design space exploration tool, by estimating, with good accuracy, the speed-up of a parallel application given the profiling on the single processor prototype.

[1]  Hong Linh Truong,et al.  On Using SCALEA for Performance Analysis of Distributed and Parallel Programs , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[2]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[3]  Kingshuk Karuri,et al.  A SW performance estimation framework for early system-level-design using fine-grained instrumentation , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[4]  Francisco J. Suárez,et al.  PET, a software monitoring toolkit for performance analysis of parallel embedded applications , 2003, J. Syst. Archit..

[5]  Bernd Mohr,et al.  Design and Prototype of a Performance Tool Interface for OpenMP , 2002, The Journal of Supercomputing.

[6]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[7]  J. M. Bull,et al.  Measuring Synchronisation and Scheduling Overheads in OpenMP , 2007 .

[8]  Arturo González-Escribano,et al.  The OpenMP source code repository , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[9]  Wayne H. Wolf The future of multiprocessor systems-on-chips , 2004, Proceedings. 41st Design Automation Conference, 2004..

[10]  Richard B. Bunt,et al.  Exploiting software interfaces for performance measurement , 1998, WOSP '98.