A reconfigurable simulator for large-scale heterogeneous multicore architectures

Future general purpose architectures will scale to hundreds of cores. In order to accommodate both latency-oriented and throughput-oriented workloads, the system is likely to present a heterogenous mix of cores. In particular, sequential code can achieve peak performance with an out-of-order core while parallel code achieves peak throughput over a set of simple, in-order (10) or single-instruction, multiple-data (SIMD) cores. These large-scale, heterogeneous architectures form a prohibitively large design space, including not just the mix of cores, but also the memory hierarchy, coherence protocol, and on-chip network (OCN). Because of the abundance of potential architectures, an easily reconfigurable multicore simulator is needed to explore the large design space. We build a reconfigurable multicore simulator based on M5, an event-driven simulator originally targeting a network of processors.

[1]  Kevin Skadron,et al.  A flexible simulation framework for graphics architectures , 2004, Graphics Hardware.

[2]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[3]  Carlos González,et al.  ATTILA: a cycle-level execution-driven simulator for modern GPU architectures , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[4]  Kevin Skadron,et al.  Exploiting inter-thread temporal locality for chip multithreading , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[5]  Dean M. Tullsen,et al.  Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[6]  Calvin Lin,et al.  A comprehensive approach to DRAM power management , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[7]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[9]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[10]  Berkin Özisikyilmaz,et al.  MineBench: A Benchmark Suite for Data Mining Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.

[11]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[12]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[13]  Luca Benini,et al.  Bringing NoCs to 65 nm , 2007, IEEE Micro.

[14]  Shyamkumar Thoziyoor,et al.  1 CACTI 4 . 0 , 2006 .

[15]  Srinivasan Murali,et al.  Bringing NoCs to 65nm , 2007 .

[16]  Kevin Skadron,et al.  Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.

[17]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[18]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[19]  Thomas F. Wenisch,et al.  SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture , 2004, PERV.

[20]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..