Reducing write activities on non-volatile memories in embedded CMPs via data migration and recomputation

Recent advances in circuit and process technologies have pushed non-volatile memory technologies into a new era. These technologies exhibit appealing properties such as low power consumption, non-volatility, shock-resistivity, and high density. However, there are challenges to which we need answers in the road of applying non-volatile memories as main memory in computer systems. First, non-volatile memories have limited number of write/erase cycles compared with DRAM memory. Second, write activities on non-volatile memory are more expensive than DRAM memory in terms of energy consumption and access latency. Both challenges will benefit from reduction of the write activities on the nonvolatile memory. In this paper, we target embedded Chip Multiprocessors (CMPs) with Scratch Pad Memory (SPM) and non-volatile main memory. We introduce data migration and recompu-tation techniques to reduce the number of write activities on non-volatile memories. Experimental results show that the proposed methods can reduce the number of writes by 59.41% on average, which means that the non-volatile memory can last 2.8 times as long as before. Meanwhile, the finish time of programs is reduced by 31.81% on average.

[1]  Edwin Hsing-Mean Sha,et al.  Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding , 2010, TECS.

[2]  Kaushik Roy,et al.  An alternate design paradigm for robust spin-torque transfer magnetic RAM (STT MRAM) from circuit/architecture perspective , 2009, ASP-DAC.

[3]  Meng Wang,et al.  Optimized address assignment with array and loop transformations for minimizing schedule length , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[4]  Alex Orailoglu,et al.  Application specific non-volatile primary memory for embedded systems , 2008, CODES+ISSS '08.

[5]  Y.J. Song,et al.  Two-bit cell operation in diode-switch phase change memory cells with 90nm technology , 2008, 2008 Symposium on VLSI Technology.

[6]  Edwin Hsing-Mean Sha,et al.  Optimizing Scheduling and Intercluster Connection for Application-Specific DSP Processors , 2009, IEEE Transactions on Signal Processing.

[7]  Li Shang,et al.  Leveraging on-chip networks for data cache migration in chip multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Chanik Park,et al.  A low-cost memory architecture with NAND XIP for mobile embedded systems , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[9]  M. Kandemir,et al.  Using data replication to reduce communication energy on chip multiprocessors , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[10]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[11]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[12]  Mahmut T. Kandemir,et al.  Using data replication to reduce communication energy on chip multiprocessors , 2005, ASP-DAC.

[13]  Edwin Hsing-Mean Sha,et al.  Optimizing Address Assignment and Scheduling for DSPs With Multiple Functional Units , 2006, IEEE Transactions on Circuits and Systems II: Express Briefs.

[14]  Trevor N. Mudge,et al.  Using non-volatile memory to save energy in servers , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[15]  M. Nakajima,et al.  A 600-MHz single-chip multiprocessor with 4.8-GB/s internal shared pipelined bus and 512-kB internal memory , 2004, IEEE Journal of Solid-State Circuits.

[16]  Young-Tae Kim,et al.  Ge2Sb2Te5 Confined Structures and Integration of 64 Mb Phase-Change Random Access Memory , 2005 .

[17]  Mahmut T. Kandemir,et al.  Reducing Off-Chip Memory Access Costs Using Data Recomputation in Embedded Chip Multi-processors , 2007, 2007 44th ACM/IEEE Design Automation Conference.