Fast architecture evaluation of heterogeneous MPSoCs by host-compiled simulation

Many domain-specific MPSoCs are heterogeneous and tiled by nature. For evaluating important architectural decisions such as tile structure and core selection within each tile for future 100--1000 core designs, fast and flexible simulation approaches are mandatory. Thus, cycle-accurate simulation techniques or co-simulation approaches using simulator coupling are improper. In this paper, we evaluate heterogeneous tiled MPSoCs using a timing-approximate simulation approach. This simulation approach takes particularly into account applications with highly dynamic thread and workload distributions and resource-aware program behavior. Here, the application itself may decide which set of resources is claimed in dependence on run-time status information of the resources (e. g., temperature, load). In order to verify performance goals of the heterogeneous MPSoC apart from functional correctness, we propose a timing-approximate simulation approach, which is based on a discrete-event host-compiled simulation and a time-warping mechanism to scale the elapsed execution times on the simulation host to the simulated target. It allows the investigation of phases of thread (re-)distribution and resource-awareness with an appropriate accuracy. For selected case studies, it is shown how architectural parameters may be varied very fast enabling the exploration of different designs for cost, performance, and other design objectives.

[1]  Daniel Gajski,et al.  Cycle-approximate Retargetable Performance Estimation at the Transaction Level , 2008, 2008 Design, Automation and Test in Europe.

[2]  Jürgen Teich,et al.  A highly parameterizable parallel processor array architecture , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[3]  Jörg Henkel,et al.  Trace-driven system-level power evaluation of system-on-a-chip peripheral cores , 2001, ASP-DAC '01.

[4]  Jürgen Teich,et al.  Invasive Algorithms and Architectures Invasive Algorithmen und Architekturen , 2008, it Inf. Technol..

[5]  Jürgen Teich,et al.  Modeling of Interconnection Networks in Massively Parallel Processor Architectures , 2007, ARCS.

[6]  Andreas Gerstlauer,et al.  Host-compiled simulation of multi-core platforms , 2010, Proceedings of 2010 21st IEEE International Symposium on Rapid System Protyping.

[7]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[8]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[9]  Andy D. Pimentel,et al.  Calibration of Abstract Performance Models for System-Level Design Space Exploration , 2006, 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[10]  Jürgen Teich,et al.  Resource-aware programming and simulation of MPSoC architectures through extension of X10 , 2011, SCOPES.

[11]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[12]  Luca Benini,et al.  MPARM: Exploring the Multi-Processor SoC Design Space with SystemC , 2005, J. VLSI Signal Process..

[13]  Jürgen Becker,et al.  Multiprocessor System-on-Chip - Hardware Design and Tool Integration , 2011, Multiprocessor System-on-Chip.

[14]  Paolo Faraboschi,et al.  COTSon: infrastructure for full system simulation , 2009, OPSR.

[15]  Zhonglei Wang,et al.  An efficient approach for system-level timing simulation of compiler-optimized embedded software , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[16]  Thomas F. Wenisch,et al.  Statistical sampling of microarchitecture simulation , 2006, IPDPS.

[17]  Jung Ho Ahn,et al.  How to simulate 1000 cores , 2009, CARN.

[18]  Wolfgang Rosenstiel,et al.  High-performance timing simulation of embedded software , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[19]  Andreas Herkersdorf,et al.  TAPES—Trace-based architecture performance evaluation with SystemC , 2005, Des. Autom. Embed. Syst..

[20]  Jürgen Teich,et al.  High-Speed Event-Driven RTL Compiled Simulation , 2004, SAMOS.

[21]  R. Nigel Horspool,et al.  Ultra fast cycle-accurate compiled emulation of inorder pipelined architectures , 2007, J. Syst. Archit..

[22]  Jürgen Teich,et al.  Dynamic decentralized mapping of tree-structured applications on NoC architectures , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[23]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[24]  Paolo Faraboschi,et al.  Combining Simulation and Virtualization through Dynamic Sampling , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[25]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[26]  Kun Lu,et al.  An approach to improve accuracy of source-level TLMs of embedded software , 2011, 2011 Design, Automation & Test in Europe.

[27]  Wolfgang Rosenstiel,et al.  Fast and accurate resource conflict simulation for performance analysis of multi-core systems , 2011, 2011 Design, Automation & Test in Europe.

[28]  S. Eranian Perfmon2: a flexible performance monitoring interface for Linux , 2010 .

[29]  Jürgen Teich,et al.  Distributed Resource Reservation in Massively Parallel Processor Arrays , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.