A limits study of benefits from nanostore-based future data-centric system architectures

The adoption of non-volatile memories (NVMs) in system architecture and the growth in data-centric workloads offer exciting opportunities for new designs. In this paper, we examine the potential and limit of designs that move compute in close proximity to NVM-based data stores. To address the challenges in evaluating such system architectures for distributed systems, we develop and validate a new methodology for large-scale data-centric workloads. We then study "nanostores" as an example design that constructs distributed systems from building blocks with 3D-stacked compute and NVM layers on the same chip, replacing both traditional storage and memory with NVM. Our limits study demonstrates significant potential of this approach (3-162X improvement in energy delay product) over 2015 baselines, particularly for IO-intensive workloads. We also discuss and quantify the impact of network bandwidth, software scalability, and power density, and design tradeoffs for future NVM-based data-centric architectures.

[1]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[2]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[3]  Hsien-Hsin S. Lee,et al.  Architectural evaluation of 3D stacked RRAM caches , 2009, 2009 IEEE International Conference on 3D System Integration.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Trevor N. Mudge,et al.  Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.

[6]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  David Blaauw,et al.  Energy efficient near-threshold chip multi-processing , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[8]  Tao Li,et al.  Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[9]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[10]  Christoforos E. Kozyrakis,et al.  JouleSort: a balanced energy-efficiency benchmark , 2007, SIGMOD '07.

[11]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Kushagra Vaid,et al.  Web Search Using Small Cores: Quantifying the Price of Efficiency , 2009 .

[13]  Parthasarathy Ranganathan,et al.  From Microprocessors to Nanostores: Rethinking Data-Centric Systems , 2011, Computer.

[14]  Parag Agrawal,et al.  The case for RAMCloud , 2011, Commun. ACM.

[15]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[16]  Trevor N. Mudge,et al.  Improving NAND Flash Based Disk Caches , 2008, 2008 International Symposium on Computer Architecture.

[17]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[18]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[19]  Qin Li,et al.  Formalizing MapReduce with CSP , 2010, 2010 17th IEEE International Conference and Workshops on Engineering of Computer Based Systems.

[20]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[21]  Krisztián Flautner,et al.  PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.

[22]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[23]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[24]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.

[25]  Hong Liu,et al.  Energy proportional datacenter networks , 2010, ISCA.

[26]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[27]  Xiaorui Wang,et al.  Exploring power-performance tradeoffs in database systems , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[28]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[29]  Steven Swanson,et al.  Gordon: An Improved Architecture for Data-Intensive Applications , 2010, IEEE Micro.

[30]  Trevor N. Mudge,et al.  FlashCache: a NAND flash memory file cache for low power web servers , 2006, CASES '06.

[31]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.