A bandwidth accurate, flexible and rapid simulating multi-HMC modeling tool

Derived by the demand for ever increasing computing performance, a steadily widening performance gap between memory and processor architectures has emerged. While attempting to mitigate the effects for processing systems that already face the exascale barrier and beyond, energy-efficient computing was identified as the critical topic to provide further scaling. Memory architectures, persistently known as slow, energy-hungry and cost-intensive, require novel findings to aid in increasing the energy efficiency as well as bandwidth. A quick fix for the performance aspect seems to be 3D stacking of such planar memories, that is available in the form of the High Bandwidth Memory (HBM) and the Hybrid Memory Cube (HMC). With the latter allowing to embed custom logic, novel non-von Neumann architectures can be accomplished, overcoming the performance gap while achieving a new path for scaling the computing performance. Considering the broad spectrum of custom logic that could be integrated into a mesh of HMCs, comprehensive modeling tools are required, enabling holistic design-space explorations for computing systems in breadth and depth. Fulfilling this demand, an HMC-modeling tool was implemented, providing rapid simulation of multiple interconnected HMCs that can run either in a functional or in a bandwidth-accurate mode. Since flexibility is a key for subsequent studies, the HMC-modeling tool is parameterizable whereas internal components can be adjusted.

[1]  Seth H. Pugsley,et al.  USIMM : the Utah SImulated Memory Module , 2012 .

[2]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[3]  Mikko H. Lipasti,et al.  Data compression for thermal mitigation in the Hybrid Memory Cube , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[4]  Daniel Gajski,et al.  Transaction level modeling: an overview , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[5]  Ki-Seok Chung,et al.  CasHMC: A Cycle-Accurate Simulator for Hybrid Memory Cube , 2017, IEEE Computer Architecture Letters.

[6]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[7]  Aamer Jaleel,et al.  DRAMsim: a memory system simulator , 2005, CARN.

[8]  J. Thomas Pawlowski,et al.  Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[9]  Yong Chen,et al.  HMC-Sim-2.0: A Simulation Platform for Exploring Custom Memory Cube Operations , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  Bruce Jacob,et al.  Buffer-on-board memory systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[11]  Norbert Wehn,et al.  DRAMSys: A Flexible DRAM Subsystem Design Space Exploration Framework , 2015, IPSJ Trans. Syst. LSI Des. Methodol..

[12]  Jarkko Niittylahti,et al.  DRAM simulator for design and analysis of digital systems , 2002, Microprocess. Microsystems.

[13]  Tejas Karkhanis,et al.  Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..

[14]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[15]  Norbert Wehn,et al.  DRAMSpec: A High-Level DRAM Timing, Power and Area Exploration Tool , 2016, International Journal of Parallel Programming.

[16]  Bruce Jacob,et al.  The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It , 2009, The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It.

[17]  Easwaran Raman,et al.  Feedback directed optimization of TCMalloc , 2014, MSPC@PLDI.

[18]  Jung Ho Ahn,et al.  CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Luca Benini,et al.  Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube , 2016, ARCS.

[20]  Ravi Nair,et al.  Evolution of Memory Architecture , 2015, Proceedings of the IEEE.

[21]  Tao Zhang,et al.  NVMain 2.0: A User-Friendly Memory Simulator to Model (Non-)Volatile Memory Systems , 2015, IEEE Computer Architecture Letters.

[22]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  J. Vetter,et al.  Exploring Design Space of 3 D NVM and eDRAM Caches Using DESTINY Tool , 2015 .

[24]  Sandeep Koranne,et al.  Boost C++ Libraries , 2011 .

[25]  Norbert Wehn,et al.  Exploring system performance using elastic traces: Fast, accurate and portable , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).

[26]  Rainer Buchty,et al.  Data-Centric Computing Frontiers: A Survey On Processing-In-Memory , 2016, MEMSYS.

[27]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[28]  Rainer Buchty,et al.  A Scriptable Standard-Compliant Reporting and Logging Framework for SystemC , 2016, ACM Trans. Embed. Comput. Syst..

[29]  Doe Hyun Yoon,et al.  The dynamic granularity memory system , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[30]  Luca Benini,et al.  High performance AXI-4.0 based interconnect for extensible smart memory cubes , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[31]  Emden R. Gansner,et al.  Graphviz - Open Source Graph Drawing Tools , 2001, GD.

[32]  Onur Mutlu,et al.  Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[33]  Thorsten Grotker,et al.  System Design with SystemC , 2002 .

[34]  Maya Gokhale,et al.  Hybrid memory cube performance characterization on data-centric workloads , 2015, IA3@SC.

[35]  Yong Chen,et al.  HMC-Sim: A Simulation Framework for Hybrid Memory Cube Devices , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[36]  Paul Rosenfeld,et al.  Performance Exploration of the Hybrid Memory Cube , 2014 .

[37]  Sparsh Mittal,et al.  Exploring Design Space of 3D NVM and eDRAM Caches Using DESTINY Tool (open-source code) , 2015 .

[38]  Thomas F. Wenisch,et al.  Simulating DRAM controllers for future system architecture exploration , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[39]  B. Jacob,et al.  Buffer-On-Board Memory System , 2012 .

[40]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[41]  Arnaldo Carvalho de Melo,et al.  The New Linux ’ perf ’ Tools , 2010 .

[42]  Jun Yang,et al.  DLB: Dynamic lane borrowing for improving bandwidth and performance in Hybrid Memory Cube , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[43]  Dong Li,et al.  DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[44]  Makoto Motoyoshi,et al.  Through-Silicon Via (TSV) , 2009, Proceedings of the IEEE.

[45]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.