A Memory Access Pattern-Based Program Profiling System for Dynamic Parallelism Prediction

Parallel computing is the simultaneous use of multiple computer resources to solve a computational problem. In order to automatically execute a sequential program so that inherent parallelism could utilize the underlying multi-core platforms, program profiling is necessary to learn the control and data flow dependency which is critical for predicting the parallelization feasibility and finding parallelism solutions. In literature, algorithms and tools have been designed for either dynamic or static program analysis in order to increase the accuracy and performance in discovering all aspects of the target program. However, memory and runtime overhead remain two challenge problems for program profiling. In this work, based on the PIN framework, we have designed and implemented a memory access pattern-based program profiling system for sequential program parallelization prediction. Validation results on seven SPEC 2006 benchmark programs show the effectiveness of proposed system on both memory usage and cpu time for target program dynamic profiling.

[1]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[2]  Tadao Nakamura,et al.  Run-time Detection Mechanism of Nested Call-loop Structure to Monitor the Actual Execution of Codes , 2009, 2009 Software Technologies for Future Dependable Distributed Systems.

[3]  Konstantinos Kyriakopoulos,et al.  The impact of data dependence analysis on compilation and program parallelization , 2003, ICS '03.

[4]  Dirk Grunwald,et al.  LoopProf : Dynamic Techniques for Loop Detection and Profiling , 2022 .

[5]  Chen Yang,et al.  A cost-driven compilation framework for speculative parallelization of sequential programs , 2004, PLDI '04.

[6]  Michael D. Ernst Static and dynamic analysis: synergy and duality , 2003 .

[7]  Moni Naor,et al.  A Simple Fault Tolerant Distributed Hash Table , 2003, IPTPS.

[8]  Per Stenström,et al.  Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[9]  Dan Grossman,et al.  TALx86: A Realistic Typed Assembly Language∗ , 1999 .

[10]  Ken Thompson,et al.  Programming Techniques: Regular expression search algorithm , 1968, Commun. ACM.

[11]  Hyesoon Kim,et al.  SD3: A Scalable Approach to Dynamic Data-Dependence Profiling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[12]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[13]  Michael D. Smith,et al.  Ephemeral Instrumentation for Lightweight Program Profiling , 1997 .

[14]  Dirk Grunwald,et al.  Identifying potential parallelism via loop-centric profiling , 2007, CF '07.

[15]  Kaivalya M. Dixit,et al.  The SPEC benchmarks , 1991, Parallel Comput..

[16]  James R. Larus,et al.  Optimally profiling and tracing programs , 1994, TOPL.

[17]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[18]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[19]  Mary Lou Soffa,et al.  Low overhead program monitoring and profiling , 2005, PASTE '05.