Accelerating Architectural Simulation Via Statistical Techniques: A Survey

In computer architecture research and development, simulation is a powerful way of acquiring and predicting processor behaviors. While architectural simulation has been extensively utilized for computer performance evaluation, design space exploration, and computer architecture assessment, it still suffers from the high computational costs in practice. Specifically, the total simulation time is determined by the simulator's raw speed and the total number of simulated instructions. The simulator's speed can be improved by enhanced simulation infrastructures (e.g., simulators with high-level abstraction, parallel simulators, and hardware-assisted simulators). Orthogonal to these work, recent studies also managed to significantly reduce the total number of simulated instructions with a slight loss of accuracy. Interestingly, we observe that most of these work are built upon statistical techniques. This survey presents a comprehensive review to such studies and proposes a taxonomy based on the sources of reduction. In addition to identifying the similarities and differences of state-of-the-art approaches, we further discuss insights gained from these studies as well as implications for future research.

[1]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[2]  Vittorio Zaccaria,et al.  A correlation-based design space exploration methodology for multi-processor systems-on-chip , 2010, Design Automation Conference.

[3]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[4]  James E. Smith,et al.  Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[5]  L. Eeckhout,et al.  Exploiting program microarchitecture independent characteristics and phase behavior for reduced benchmark suite simulation , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[6]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[7]  Michael Adler,et al.  HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[8]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[9]  Kapil Vaswani,et al.  A Predictive Performance Model for Superscalar Processors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[10]  James E. Smith,et al.  Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox , 2003, IEEE Micro.

[11]  Jose Renau,et al.  ESESC: A fast multicore simulator using Time-Based Sampling , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[12]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[13]  Alejandro Duran,et al.  Trace-driven simulation of multithreaded applications , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[14]  Lieven Eeckhout,et al.  Designing Computer Architecture Research Workloads , 2003, Computer.

[15]  Yu Zhang,et al.  Parallelization of IBM mambo system simulator in functional modes , 2008, OPSR.

[16]  Yao Zhang,et al.  Systematic evaluation of workload clustering for extremely energy-efficient architectures , 2013, CARN.

[17]  Kevin Skadron,et al.  Predictive design space exploration using genetically programmed response surfaces , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[18]  Tianshi Chen,et al.  Statistical Performance Comparisons of Computers , 2012, IEEE Transactions on Computers.

[19]  Brad Calder,et al.  A co-phase matrix to guide simultaneous multithreading simulation , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[20]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[21]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[22]  Lizy Kurian John,et al.  Subsetting the SPEC CPU2006 benchmark suite , 2007, CARN.

[23]  Vittorio Zaccaria,et al.  ReSPIR: A Response Surface-Based Pareto Iterative Refinement for Application-Specific Design Space Exploration , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[24]  Dam Sunwoo,et al.  FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.

[25]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[26]  Stijn Eyerman,et al.  How sensitive is processor customization to the workload's input datasets? , 2011, 2011 IEEE 9th Symposium on Application Specific Processors (SASP).

[27]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[28]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[29]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[30]  Thomas F. Wenisch,et al.  Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.

[31]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[32]  B. Jacob,et al.  CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .

[33]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[34]  David M. Brooks,et al.  Applied inference: Case studies in microarchitectural design , 2010, TACO.

[35]  Luca P. Carloni,et al.  PhoenixSim: A simulator for physical-layer analysis of chip-scale photonic interconnection networks , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[36]  Li Zhao,et al.  QuickIA: Exploring heterogeneous architectures on real prototypes , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[37]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[38]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.

[39]  Brad Calder,et al.  Using Machine Learning to Guide Architecture Simulation , 2006, J. Mach. Learn. Res..

[40]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[41]  Charlie Johnson,et al.  IBM Power Edge of Network Processor: A Wire-Speed System on a Chip , 2011, IEEE Micro.

[42]  DebK.,et al.  A fast and elitist multiobjective genetic algorithm , 2002 .

[43]  Douglas M. Hawkins,et al.  A statistically rigorous approach for improving simulation methodology , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[44]  Michael Wong C++ benchmarks in SPEC CPU2006 , 2007, CARN.

[45]  Lieven Eeckhout,et al.  Control flow modeling in statistical simulation for accurate and efficient processor design studies , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[46]  Michael F. P. O'Boyle,et al.  A Predictive Model for Dynamic Microarchitectural Adaptivity Control , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[47]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[48]  Mark Horowitz,et al.  Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis , 2010, ISCA.

[49]  Franz Franchetti,et al.  Understanding the design space of DRAM-optimized hardware FFT accelerators , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[50]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[51]  Zhanpeng Jin,et al.  Evolutionary Benchmark Subsetting , 2008, IEEE Micro.

[52]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[53]  James E. Smith,et al.  Advanced Micro Devices , 2005 .

[54]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[55]  David A. Wood,et al.  Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[56]  Michael F. P. O'Boyle,et al.  Microarchitectural Design Space Exploration Using an Architecture-Centric Approach , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[57]  Hsien-Hsin S. Lee,et al.  TBPoint: Reducing Simulation Time for Large-Scale GPGPU Kernels , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[58]  Thomas F. Wenisch,et al.  SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture , 2004, PERV.

[59]  Tao Li,et al.  Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis , 2008, 2008 IEEE International Symposium on Workload Characterization.

[60]  Vincenzo Catania,et al.  Efficient design space exploration for application specific systems-on-a-chip , 2007, J. Syst. Archit..

[61]  Babak Falsafi,et al.  ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs , 2009, TRETS.

[62]  Michael F. P. O'Boyle,et al.  Exploring and predicting the architecture/optimising compiler co-design space , 2008, CASES '08.

[63]  Tianshi Chen,et al.  ArchRanker: A ranking approach to design space exploration , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[64]  Hai Jin,et al.  Accelerating GPGPU architecture simulation , 2013, SIGMETRICS '13.

[65]  Thin-Fong Tsuei,et al.  Queuing Simulation Model for Multiprocessor Systems , 2003, Computer.

[66]  James E. Smith,et al.  A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.

[67]  Lieven Eeckhout,et al.  Evaluating iterative optimization across 1000 datasets , 2010, PLDI '10.

[68]  David M. Brooks,et al.  Illustrative Design Space Studies with Microarchitectural Regression Models , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[69]  Lieven Eeckhout,et al.  Chip Multiprocessor Design Space Exploration through Statistical Simulation , 2009, IEEE Transactions on Computers.

[70]  Lars Albertsson,et al.  Using Complete System Simulation for Temporal Debugging of General Purpose Operating Systems and Workload , 2000, MASCOTS.

[71]  Christopher J. Hughes,et al.  RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors , 2002, Computer.

[72]  Lieven Eeckhout,et al.  Workload design: selecting representative program-input pairs , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[73]  Vittorio Zaccaria,et al.  Multi-objective design space exploration of embedded systems , 2003, J. Embed. Comput..

[74]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[75]  Sally A. McKee,et al.  Machine learning based online performance prediction for runtime parallelization and task scheduling , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[76]  Lieven Eeckhout,et al.  Measuring benchmark similarity using inherent program characteristics , 2006, IEEE Transactions on Computers.

[77]  Tao Li,et al.  Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[78]  Gu-Yeon Wei,et al.  Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[79]  James E. Smith,et al.  The future of simulation: a field of dreams , 2006, Computer.

[80]  Ieee Circuits,et al.  IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems information for authors , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[81]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[82]  Thomas F. Wenisch,et al.  Statistical sampling of microarchitecture simulation , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[83]  Wei-Chung Hsu,et al.  On the predictability of program behavior using different input data sets , 2002, Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.

[84]  Frederic T. Chong,et al.  HLS: combining statistical and symbolic simulation to guide microprocessor designs , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[85]  Lieven Eeckhout,et al.  Statistical simulation of chip multiprocessors running multi-program workloads , 2007, 2007 25th International Conference on Computer Design.

[86]  Tianshi Chen,et al.  Effective and efficient microprocessor design space exploration using unlabeled design configurations , 2011, IJCAI.

[87]  Tor M. Aamodt,et al.  Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[88]  Salman Khan,et al.  Using PredictiveModeling for Cross-Program Design Space Exploration in Multicore Systems , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[89]  Robert Golla,et al.  T4: A highly threaded server-on-a-chip with native support for heterogeneous computing , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[90]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[91]  Engin Ipek,et al.  Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[92]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[93]  Thomas M. Conte,et al.  Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[94]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[95]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[96]  Lieven Eeckhout,et al.  BarrierPoint: Sampled simulation of multi-threaded applications , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[97]  Stijn Eyerman,et al.  Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[98]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[99]  Johan Montelius,et al.  Performance Debugging and Tuning using an Instruction-Set Simulator , 1997 .

[100]  Benjamin C. Lee,et al.  Inferred Models for Dynamic and Sparse Hardware-Software Spaces , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[101]  Thomas F. Wenisch,et al.  TurboSMARTS: accurate microarchitecture simulation sampling in minutes , 2005, SIGMETRICS '05.

[102]  Kapil Vaswani,et al.  Construction and use of linear regression models for processor performance analysis , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[103]  Kenji Kise,et al.  An FPGA-based scalable simulation accelerator for tile architectures , 2011, CARN.

[104]  Margaret Martonosi,et al.  Speculative Updates of Local and Global Branch History: A Quantitative Analysis , 2000, J. Instr. Level Parallelism.

[105]  Lieven Eeckhout,et al.  Sampled simulation of multi-threaded applications , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[106]  Gianluca Palermo,et al.  DRuiD: Designing reconfigurable architectures with decision-making support , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[107]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[108]  Jason Cong,et al.  Architecture support for accelerator-rich CMPs , 2012, DAC Design Automation Conference 2012.

[109]  Lixin Zhang,et al.  Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.

[110]  Kenneth A. Ross,et al.  Q100: the architecture and design of a database processing unit , 2014, ASPLOS.

[111]  Sally A. McKee,et al.  Efficient architectural design space exploration via predictive modeling , 2008, TACO.

[112]  Vittorio Zaccaria,et al.  DeSpErate++: An Enhanced Design Space Exploration Framework Using Predictive Simulation Scheduling , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[113]  Stijn Eyerman,et al.  Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance , 2014, ACM Trans. Archit. Code Optim..

[114]  Lieven Eeckhout,et al.  Performance prediction based on inherent program similarity , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[115]  Lizy Kurian John,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007, ISCA '07.

[116]  David M. Brooks,et al.  CPR: Composable performance regression for scalable multiprocessor models , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[117]  David Black-Schaffer,et al.  Micro-architecture independent analytical processor performance and power modeling , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[118]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[119]  Zhanpeng Jin,et al.  Improve simulation efficiency using statistical benchmark subsetting - An implantbench case study , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[120]  Zhanpeng Jin,et al.  SubsetTrio: An evolutionary, geometric, and statistical benchmark subsetting framework , 2011, TOMC.

[121]  David A. Patterson,et al.  RAMP gold: An FPGA-based architecture simulator for multiprocessors , 2010, Design Automation Conference.