Performance estimation for task graphs combining sequential path profiling and control dependence regions

The speed-up estimation of parallelized code is crucial to efficiently compare different parallelization techniques or task graph transformations. Unfortunately, most of the time, during the parallelization of a specification, the information that can be extracted by profiling the corresponding sequential code (e.g. the most executed paths) are not properly taken into account. In particular, correlating sequential path profiling with the corresponding parallelized code can help in the identification of code hot spots, opening new possibilities for automatic parallelization. For this reason, starting from a well-known profiling technique, the Efficient Path Profiling, we propose a methodology that estimates the speed-up of a parallelized specification, just using the corresponding hierarchical task graph representation and the information coming from the dynamic profiling of the initial sequential specification. Experimental results show that the proposed solution outperforms existing approaches.

[1]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[2]  Arturo González-Escribano,et al.  The OpenMP source code repository , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[3]  Lothar Thiele,et al.  Performance analysis of distributed embedded systems , 2007, EMSOFT '07.

[4]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[5]  Sharad Malik,et al.  Static timing analysis of embedded software , 1997, DAC.

[6]  David B. Whalley,et al.  Supporting Timing Analysis by Automatic Bounding of Loop Iterations , 2000, Real-Time Systems.

[7]  Antonia Zhai,et al.  Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.

[8]  Milind Girkar,et al.  Automatic Extraction of Functional Parallelism from Ordinary Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[9]  James R. Larus,et al.  Optimally profiling and tracing programs , 1992, POPL '92.

[10]  Kingshuk Karuri,et al.  A SW performance estimation framework for early system-level-design using fine-grained instrumentation , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[11]  Wayne H. Wolf The future of multiprocessor systems-on-chips , 2004, Proceedings. 41st Design Automation Conference, 2004..

[12]  Sharad Malik,et al.  Flexible and formal modeling of microprocessors with application to retargetable simulation , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[13]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[14]  Luca Benini,et al.  MPARM: Exploring the Multi-Processor SoC Design Space with SystemC , 2005, J. VLSI Signal Process..

[15]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[16]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[17]  Xiangyu Zhang,et al.  Extending path profiling across loop backedges and procedure boundaries , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[18]  Thomas W. Reps,et al.  Interprocedural Path Profiling , 1999, CC.

[19]  Rolf Ernst,et al.  Embedded program timing analysis based on path clustering and architecture classification , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[20]  Michael A. Harrison,et al.  Accurate static estimators for program optimization , 1994, PLDI '94.

[21]  Ahmed Amine Jerraya,et al.  Software Performance Estimation in MPSoC Design , 2007, 2007 Asia and South Pacific Design Automation Conference.

[22]  Guang R. Gao,et al.  Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations , 1992, LCPC.

[23]  J. Larus Whole program paths , 1999, PLDI '99.

[24]  Toshiaki Yasue,et al.  An efficient online path profiling framework for Java just-in-time compilers , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[25]  Luciano Lavagno,et al.  Software performance estimation strategies in a system-level design tool , 2000, Proceedings of the Eighth International Workshop on Hardware/Software Codesign. CODES 2000 (IEEE Cat. No.00TH8518).

[26]  Guang R. Gao,et al.  Identifying loops using DJ graphs , 1996, TOPL.

[27]  Sharad Malik,et al.  Using a communication architecture specification in an application-driven retargetable prototyping platform for multiprocessing , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[28]  Mitsuhisa Sato,et al.  OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors , 2002, 15th International Symposium on System Synthesis, 2002..