A Physical-Aware Framework for Memory Network Design Space Exploration

At the era of big data, there have been growing demands for server memory capacity and performance. Memory network is a promising alternative to provide high bandwidth and low latency through distributed memory nodes connected by high speed interconnect. However, most of them implement the design from a pure-logic-level and ignore the physical impact from network interconnect latency, processor placement and the interplay between processor and memory. In this work, we propose a Physical-Aware framework for memory network design space exploration, which facilitates the design of an energy efficient and physical-aware memory network system. Experimental results on various workloads show that the proposed framework can help customize network topology with significant improvements on various design metrics when compared to the other commonly used topologies.

[1]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[2]  Zhiguo Shi,et al.  Noise-Aware DVFS for Efficient Transitions on Battery-Powered IoT Devices , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Hongyuan Qi,et al.  Memory latency optimizations for the elementary functions on the Sunway architecture , 2018, The Journal of Supercomputing.

[4]  Shaahin Hessabi,et al.  SMART: A scalable mapping and routing technique for power-gating in NoC routers , 2017, 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS).

[5]  Keyvan RahimiZadeh,et al.  Design and performance evaluation of Mesh-of-Tree-based hierarchical wireless network-on-chip for multicore systems , 2019, J. Parallel Distributed Comput..

[6]  Mahmut T. Kandemir,et al.  Addressing End-to-End Memory Access Latency in NoC-Based Multicores , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[7]  Yiyu Shi,et al.  Dynamic Frequency Scaling Aware Opportunistic Through-Silicon-Via Inductor Utilization in Resonant Clocking , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Ümit Y. Ogras,et al.  Efficient Cache Reconfiguration Using Machine Learning in NoC-Based Many-Core CMPs , 2019, ACM Trans. Design Autom. Electr. Syst..

[9]  Alper Buyuktosunoglu,et al.  Attaché: Towards Ideal Memory Compression by Mitigating Metadata Bandwidth Overheads , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Yan Solihin,et al.  STM: Cloning the spatial and temporal memory access behavior , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[11]  Di Gao,et al.  Eva-CiM: A System-Level Performance and Energy Evaluation Framework for Computing-in-Memory Architectures , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[13]  Ahmad Khademzadeh,et al.  Efficient mapping algorithm on mesh-based NoCs in terms of cellular learning automata , 2019, Int. Arab J. Inf. Technol..

[14]  Pier Stanislao Paolucci,et al.  Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores , 2015, ArXiv.

[15]  Alexander Shpiner,et al.  Dragonfly+: Low Cost Topology for Scaling Datacenters , 2017, 2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB).

[16]  Vijay Laxmi,et al.  Resilient routing implementation in 2D mesh NoC , 2016, Microelectron. Reliab..

[17]  Haibo Zhang,et al.  Testudo: A Low Latency and High-Efficient Memory-Centric Network Using Optical Interconnect , 2017, GLOBECOM 2017 - 2017 IEEE Global Communications Conference.

[18]  Ye Yu,et al.  String Figure: A Scalable and Elastic Memory Network Architecture , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[19]  Xiao Qin,et al.  WPS: A Workload-Aware Placement Scheme for Erasure-Coded In-Memory Stores , 2017, 2017 International Conference on Networking, Architecture, and Storage (NAS).

[20]  Sudeep Pasricha,et al.  RAPID: Memory-Aware NoC for Latency Optimized GPGPU Architectures , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[21]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[22]  Salil S. Kanhere,et al.  MOF-BC: A Memory Optimized and Flexible BlockChain for Large Scale Networks , 2018, Future Gener. Comput. Syst..

[23]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[24]  Natalie D. Enright Jerger,et al.  Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.

[25]  Rajesh K. Gupta,et al.  Reliability-Aware Data Placement for Heterogeneous Memory Architecture , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  Alok N. Choudhary,et al.  The Impact of Dynamic Directories on Multicore Interconnects , 2013, Computer.

[27]  Jürgen Becker,et al.  Efficient memory access in 2D Mesh NoC architectures using high bandwidth routers , 2013, 2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI).

[28]  Ye Yu,et al.  Space Shuffle: A Scalable, Flexible, and High-Bandwidth Data Center Network , 2014, ICNP.

[29]  Hesam Shabani,et al.  ClusCross: a new topology for silicon interposer-based network-on-chip , 2019, NOCS.

[30]  Zhiguo Shi,et al.  Energy-Efficient Real-Time UAV Object Detection on Embedded Platforms , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[31]  Dan Feng,et al.  Asymmetric-ReRAM: A Low Latency and High Reliability Crossbar Resistive Memory Architecture , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).

[32]  Giuseppe Psaila,et al.  Hadoop vs. Spark: Impact on Performance of the Hammer Query Engine for Open Data Corpora , 2018, Algorithms.

[33]  Wei Ge,et al.  The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.

[34]  Bin Xue,et al.  Challenges and emerging solutions in testing HBM IO & systems , 2018, 2018 IEEE 19th Latin-American Test Symposium (LATS).

[35]  Subramanian S. Iyer,et al.  Heterogeneous Integration for Performance and Scaling , 2016, IEEE Transactions on Components, Packaging and Manufacturing Technology.

[36]  Yiyu Shi,et al.  From Layout to System: Early Stage Power Delivery and Architecture Co-Exploration , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.