Loop Block Profiling with Performance Prediction

With increase in the complexity of High Performance Computing systems, the complexity of applications has increased as well. To achieve better performance by effectively exploiting parallelism from High Performance Computing architectures, we need to analyze/identify various parameters such as, the code hotspot (kernel), execution time, etc of the program. Statistics say that a program usually spends 90% of the time in executing less than 10% of the code. If we could optimize even some small portion of the 10% of the code that takes 90% of the execution time we have a high probability of getting better performance. So we must find the bottleneck, that is the part of the code which takes a long time to run which is usually called the hotspot. Profiling provides a solution to the question: which portions of the code should be optimized/parallelized, for achieving better performance. In this research work we develop a light-weight profiler that gives information about which portions of the code is the hotspot and estimates the maximum speedup that could be achieved, if the hotspot is parallelized. Keywords—Profiling, Loop Block Profile, Code Analysis, Performance Prediction, Speedup Estimation

[1]  Peng Wu,et al.  Compiler-Driven Dependence Profiling to Guide Program Parallelization , 2008, LCPC.

[2]  Antonia Bertolino,et al.  Software Testing Research: Achievements, Challenges, Dreams , 2007, Future of Software Engineering (FOSE '07).

[3]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[4]  Lei Gao,et al.  TotalProf: a fast and accurate retargetable source code profiler , 2009, CODES+ISSS '09.

[5]  Keith H. Bennett,et al.  Software maintenance and evolution: a roadmap , 2000, ICSE '00.

[6]  Mark Harman,et al.  Analysis and visualization of predicate dependence on formal parameters and global variables , 2004, IEEE Transactions on Software Engineering.

[7]  Rajesh Bordawekar,et al.  Modeling optimistic concurrency using quantitative dependence analysis , 2008, PPOPP.

[8]  Gerardo Canfora,et al.  New Frontiers of Reverse Engineering , 2007, Future of Software Engineering (FOSE '07).

[9]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[10]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[11]  David W. Binkley,et al.  Source Code Analysis: A Road Map , 2007, Future of Software Engineering (FOSE '07).

[12]  Saturnino Garcia,et al.  The Kremlin Oracle for Sequential Code Parallelization , 2012, IEEE Micro.

[13]  Michael F. P. O'Boyle,et al.  Fast compiler optimisation evaluation using code-feature based performance prediction , 2007, CF '07.

[14]  Dorina C. Petriu,et al.  The Future of Software Performance Engineering , 2007, Future of Software Engineering (FOSE '07).

[15]  Henrik Nilsson,et al.  Lazy Algorithmic Debugging: Ideas for Practical Implementation , 1993, AADEBUG.

[16]  Frank Vahid,et al.  Profiling tools for hardware/software partitioning of embedded applications , 2003, LCTES.

[17]  Mark Harman,et al.  The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[18]  Pranith Kumar,et al.  Predicting Potential Speedup of Serial Code via Lightweight Profiling and Emulations with Memory Performance Model , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[19]  Tobias Maier,et al.  JSON - JavaScript Object Notation , 2012 .

[20]  Philippe Clauss,et al.  Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.