String Figure: A Scalable and Elastic Memory Network Architecture

Author(s): Ogleari, Matheus Almeida; Yu, Ye; Qian, Chen; Miller, Ethan L; Zhao, Jishen; IEEE

[1]  Mehrzad Samadi,et al.  Memory-centric system interconnect design with hybrid memory cubes , 2013, PACT 2013.

[2]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3]  Alexandre M. Amory,et al.  Topology-agnostic fault-tolerant NoC routing method , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Axel Jantsch,et al.  Addressing Transient and Permanent Faults in NoC With Efficient Fault-Tolerant Deflection Router , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Jaeha Kim,et al.  Memory-centric system interconnect design with Hybrid Memory Cubes , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[6]  Yuangang Wang,et al.  A unified memory network architecture for in-memory computing in commodity servers , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Jens Sparsø,et al.  The ReNoC Reconfigurable Network-on-Chip: Architecture, Configuration Algorithms, and Evaluation , 2011, TECS.

[8]  Hideharu Amano,et al.  Randomizing Packet Memory Networks for Low-Latency Processor-Memory Communication , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[9]  Radu Marculescu,et al.  "It's a small world after all": NoC performance optimization via long-range link insertion , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Onur Mutlu,et al.  Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[11]  Sudhakar Yalamanchili,et al.  Bubble sharing: Area and energy efficient adaptive routers using centralized buffers , 2014, 2014 Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Christopher Batten,et al.  PyMTL: A Unified Framework for Vertically Integrated Computer Architecture Research , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  T. Lawson,et al.  Spark , 2011 .

[16]  Intel ® Xeon Phi TM Processor 7200 Family Memory Management Optimizations , 2016 .

[17]  Valeria Bertacco,et al.  uDIREC: Unified diagnosis and reconfiguration for frugal bypass of NoC faults , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Torsten Hoefler,et al.  Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability , 2018, ASPLOS.

[19]  Matthew Poremba,et al.  There and back again: Optimizing the interconnect in networks of memory cubes , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[20]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[21]  Hannu Tenhunen,et al.  A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chips , 2015, NOCS.

[22]  Valentin Puente,et al.  Immunet: a cheap and robust fault-tolerant packet routing mechanism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[23]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[24]  David I. Rich,et al.  The evolution of systemverilog , 2003, IEEE Design & Test of Computers.

[25]  Davide Bertozzi,et al.  Synergistic use of multiple on-chip networks for ultra-low latency and scalable distributed routing reconfiguration , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[26]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[27]  John Kim,et al.  Multi-GPU System Design with Memory Networks , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[28]  Bruce Jacob,et al.  Low Latency, High Bisection-Bandwidth Networks for Exascale Memory Systems , 2016, MEMSYS.

[29]  Hyesoon Kim,et al.  Instruction Offloading with HMC 2.0 Standard: A Case Study for Graph Traversals , 2015, MEMSYS.

[30]  Sudhakar Yalamanchili,et al.  Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures , 2013, ACM Trans. Design Autom. Electr. Syst..

[31]  Valeria Bertacco,et al.  High-radix on-chip networks with low-radix routers , 2014, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[32]  Fernando Gehm Moraes,et al.  Virtual Channels in Networks on Chip: Implementation and Evaluation on Hermes NoC , 2005, 2005 18th Symposium on Integrated Circuits and Systems Design.

[33]  Norman P. Jouppi,et al.  Readings in computer architecture , 2000 .

[34]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[35]  Ye Yu,et al.  Space Shuffle: A Scalable, Flexible, and High-Bandwidth Data Center Network , 2014, ICNP.

[36]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[37]  Valeria Bertacco,et al.  Brisk and limited-impact NoC routing reconfiguration , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[40]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[41]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[42]  Chris Mason,et al.  Transcendent Memory and Linux , 2006 .

[43]  Li-Shiuan Peh,et al.  ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[44]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[45]  David Blaauw,et al.  A highly resilient routing algorithm for fault-tolerant NoCs , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[46]  Henri Casanova,et al.  A case for random shortcut topologies for HPC interconnects , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[47]  William J. Dally,et al.  Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.

[48]  Yoonho Park,et al.  Data access optimization in a processing-in-memory system , 2015, Conf. Computing Frontiers.