NearZero: An Integration of Phase Change Memory with Multi-Core Coprocessor

Multi-core based coprocessors have become powerful research vehicles to analyze a large amount of data. Even though they can accelerate data processing by using a hundred cores, the data unfortunately exist on an external storage device. The separation of computation and storage introduces redundant memory copies and unnecessary data transfers over different physical device boundaries, which limit the benefits of coprocessor-accelerated data processing. In addition, the coprocessors need assistance from host-side resources to access the external storage, which can require additional system context switches. To address these challenges, we propose NearZero, a novel DRAM-less coprocessor architecture that precisely integrates a state-of-the-art phase change memory into its multi-core accelerator. In this work, we implement an FPGA-based memory controller that extracts important device parameters from real phase change memory chips, and apply them to a commercially available hardware platform that employs multiple processing elements over a PCIe fabric. The evaluation results reveal that NearZero achieves on average 47 percent better performance than advanced coprocessor approaches that use direct I/Os (between storage and coprocessors), while consuming only 19 percent of the total energy of such advanced coprocessors.

[1]  Jung-Bae Lee,et al.  A 1.2V 30nm 1.6Gb/s/pin 4Gb LPDDR3 SDRAM with input skew calibration and enhanced control scheme , 2012, 2012 IEEE International Solid-State Circuits Conference.

[2]  Byung-Gil Choi,et al.  A 0.1-$\mu{\hbox {m}}$ 1.8-V 256-Mb Phase-Change Random Access Memory (PRAM) With 66-MHz Synchronous Burst-Read Operation , 2007, IEEE Journal of Solid-State Circuits.

[3]  John Shalf,et al.  NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[4]  Steven Swanson,et al.  Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[5]  Luis A. Lastras,et al.  PreSET: Improving performance of phase change memories by exploiting asymmetry in write times , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[6]  H. K. Kang,et al.  PRAM cell technology and characterization in 20nm node size , 2011, 2011 International Electron Devices Meeting.

[7]  Jinyoung Lee,et al.  Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).