13.4 A 22nm 1Mb 1024b-Read and Near-Memory-Computing Dual-Mode STT-MRAM Macro with 42.6GB/s Read Bandwidth for Security-Aware Mobile Devices

Many security-aware mobile devices, using the secure hash algorithm (SHA) or the advanced encryption standard (AES) for data encryption, require short read-access time (tAC) and wide-IO from nonvolatile memory (NVM) for high-read bandwidth and SHA/AES shift/rotate functions. STT-MRAM is the major on-chip NVM for advanced process nodes [2]–[6]; however, it requires small-offset sense amplifiers (SAs) for robust reads, against a small TMR-ratio, at the expense of large area overhead and read energy (ERD). As Fig. 13.4.1 shows, designing STT-MRAM macros for security-related applications imposes three main challenges. (1) Using a large number of SAs for wide parallel-IO readout to achieve a short tAC, but this results in a high peak current lPEAK and a large area overhead. Using fewer SAs for sequential wide-IO readout reduces lPEAK and area overhead, but imposes long tAC and a low read bandwidth (BWR). (2) MRAM macros with a high lPEAK degrade the supply (VDD) integrity of the chip, often leading to failure in noise-sensitive blocks on the same chip. (3) A conventional memory-logic-separated scheme imposes a long latency (2 cycles: wide-IO memory read + flip-flop (FF) shift/rotate) for NVM-based security logic operations. This paper presents a multibit current-mode SA (MB-CSA) for a high BWR with a short tAC and a low lPEAK. Also presented is a near-memory computing (NMC) unit with a 1-cycle access, to speed up computing for security applications. This work resulted in a 22nm 1 Mb STT-MRAM macro with dual-mode operations: wide-IO memory and NMC. The proposed 1 Mb macro demonstrates the largest number of data-out operations (1024b) with a tAC of 275ns using a 0.85V supply. In memory mode, this device outperformed all reported NVM macros in terms of BWR (42.67GB/s) and ERD(0.23pJ/b. This work also presents the first MRAM macro with NMC functionality, a 33.3% reduction in logic area, and only a 170ps latency, after NVM access, for 1 b shift/rotate operations.