Achieving Accurate In-Memory Neural Network Inference with Highly Overlapping Nonvolatile Memory State Distributions

Analog in-memory computing is a method to improve the efficiency of deep neural network inference by orders of magnitude, by utilizing analog properties of a nonvolatile memory. This places new requirements on the memory device, which physically represent neural net weights as analog states. By carefully considering the algorithm implications when mapping weights to physical states, it is possible to achieve precision very close to that of a digital accelerator using a 40nm embedded SONOS.