An Integrated Performance Estimation Approach in a Hybrid Simulation Framework

The increasing complexities of today’s embedded multimedia and wireless devices have ushered in the era of heterogeneous MultiProcessor System-on-Chip (MPSoC) architectures. This trend, in turn, have made software parallelization and optimization a subject of utmost importance for today’s systems. Nowadays, providing efficient software implementations is not only mandatory for the final products, but also necessary for Design Space Exploration (DSE) of the numerous hardware choices available for an MPSoC development. Unfortunately, such co-exploration of different software solutions and hardware architectures usually requires an extraordinarily large effort due to the continually increasing gap between the speed of the real processor hardware and that of the available instruction set simulators. This problem can be greatly alleviated by using cycle approximate, but fast simulation models for early DSE where relative merits of different design solutions are more important than hundred percent cycle accuracy. To address the issue of fast performance estimation for DSE, HySim a hybrid simulation framework which consists of an Instruction Set Simulator (ISS) and a native execution engine called Virtual CoProcessor (VCP) has been proposed. Virtualization is performed so that parts of the execution can be shifted to the native engine without sacrificing the functional correctness of the whole application. High execution speed with performance estimation is available at the VCP side, and therefore, combined with the ISS, good accuracy can be obtained. Our previous work introduced two performance estimation approaches annotation based and dynamic profiling based. However, some new questions are opened on how to combine these two approaches, and how to effectively partition an application to VCP and ISS. In this paper, annotation based performance estimation is used to facilitate dynamic profiling, and a preliminary sampling approach is also introduced to open the possibility of further reducing dynamic profiling overhead by inter/extrapolation.

[1]  Kingshuk Karuri,et al.  Fine-grained application source code profiling for ASIP design , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[2]  Multiprocessor performance estimation using hybrid simulation , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[3]  Rajesh K. Gupta,et al.  Phase guided sampling for efficient parallel application simulation , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[4]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[5]  Xinping Zhu,et al.  A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[6]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.

[7]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[8]  Lieven Eeckhout,et al.  Performance analysis through synthetic trace generation , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[9]  Anish Muttreja,et al.  Hybrid simulation for embedded software energy estimation , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[10]  Peter K. Szwed,et al.  SimSnap: fast-forwarding via native execution and application-level checkpointing , 2004, Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004..

[11]  Daniel D. Gajski,et al.  A retargetable, ultra-fast instruction set simulator , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[12]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[13]  Alberto L. Sangiovanni-Vincentelli,et al.  Source-Level Timing Annotation and Simulation for a Heterogeneous Multiprocessor , 2008, 2008 Design, Automation and Test in Europe.

[14]  Brad Calder,et al.  Discovering and Exploiting Program Phases , 2003, IEEE Micro.

[15]  Heinrich Meyr,et al.  A methodology for the design of application specific instruction set processors (ASIP) using the machine description language LISA , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[16]  Edwin A. Harcourt,et al.  Compilation-based software performance estimation for system level design , 2000, Proceedings IEEE International High-Level Design Validation and Test Workshop (Cat. No.PR00786).

[17]  Brad Calder,et al.  Detecting phases in parallel applications on shared memory architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[18]  Kiyoung Choi,et al.  Fast cycle-approximate MPSoC simulation based on synchronization time-point prediction , 2007, Des. Autom. Embed. Syst..

[19]  Lei Gao,et al.  A fast and generic hybrid simulation approach using C virtual machine , 2007, CASES '07.

[20]  Rainer Leupers,et al.  A universal technique for fast and flexible instruction-set architecture simulation , 2002, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21]  Anish Muttreja,et al.  Hybrid Simulation for Energy Estimation of Embedded Software , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Martin Burtscher,et al.  Automatic Synthesis of High-Speed Processor Simulators , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[23]  Collin McCurdy,et al.  Using Pin as a memory reference generator for multiprocessor simulation , 2005, CARN.

[24]  James R. Larus,et al.  Fast out-of-order processor simulation using memoization , 1998, ASPLOS VIII.

[25]  Nikil D. Dutt,et al.  Instruction set compiled simulation: a technique for fast and flexible instruction set simulation , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[26]  Andreas Herkersdorf,et al.  Performance Evaluation for System-on-Chip Architectures using Trace-based Transaction Level Simulation , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[27]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[28]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[29]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[30]  Rainer Leupers,et al.  A modular simulation framework for spatial and temporal task mapping onto multi-processor SoC platforms , 2005, Design, Automation and Test in Europe.

[31]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.