Resistive Associative Processor

Associative Processor (AP) combines data storage and data processing, and functions simultaneously as a massively parallel array SIMD processor and memory. Traditionally, AP is based on CMOS technology, similar to other classes of massively parallel SIMD processors. The main component of AP is a Content Addressable Memory (CAM) array. As CMOS feature scaling slows down, CAM experiences scalability problems. In this work, we propose and investigate an AP based on resistive CAM-the Resistive AP (ReAP). We show that resistive memory technology potentially allows scaling the AP from a few millions to a few hundred millions of processing units on a single silicon die. We compare the performance and power consumption of a ReAP to a CMOS AP and a conventional SIMD accelerator (GPU) and show that ReAP, although exhibiting higher power density, allows better scalability and higher performance.

[1]  Kyoung-Rok Cho,et al.  Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Uri C. Weiser,et al.  TEAM: ThrEshold Adaptive Memristor Model , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3]  Omid Kavehei,et al.  An associative capacitive network based on nanoscale complementary resistive switches for memory-intensive computing. , 2013, Nanoscale.

[4]  Samuel Williams,et al.  An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[5]  Ran Ginosar,et al.  Sparse Matrix Multiplication On An Associative Processor , 2015, IEEE Transactions on Parallel and Distributed Systems.

[6]  Ran Ginosar,et al.  Computer Architecture with Associative Processor Replacing Last-Level Cache and SIMD Accelerator , 2013, IEEE Transactions on Computers.

[7]  James C. Hoe,et al.  Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  B. Parhami,et al.  Content addressable parallel processors , 1978, Proceedings of the IEEE.

[9]  Fabien Alibart,et al.  Hybrid CMOS/nanodevice circuits for high throughput pattern matching applications , 2011, 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[10]  Jing Li,et al.  1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing , 2014, IEEE Journal of Solid-State Circuits.

[11]  Eby G. Friedman,et al.  AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.

[12]  Avinoam Kolodny,et al.  Multistate Register Based on Resistive RAM , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Peilin Song,et al.  1Mb 0.41 µm2 2T-2R cell nonvolatile TCAM with two-bit encoding and clocked self-referenced sensing , 2013, 2013 Symposium on VLSI Circuits.

[14]  R. Williams,et al.  Sub-nanosecond switching of a tantalum oxide memristor , 2011, Nanotechnology.

[15]  Shoji Ikeda,et al.  Standby-Power-Free Compact Ternary Content-Addressable Memory Cell Chip Using Magnetic Tunnel Junction Devices , 2009 .

[16]  Yiran Chen,et al.  Design of Spin-Torque Transfer Magnetoresistive RAM and CAM/TCAM with High Sensing and Search Speed , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.