Design of Hardware-Friendly Memory Enhanced Neural Networks

Neural networks with external memories have been proven to minimize catastrophic forgetting, a major problem in applications such as lifelong and few-shot learning. However, such memory enhanced neural networks (MENNs) often require a large number of floating point-based cosine distance metric calculations to perform necessary attentional operations, which greatly increases energy consumption and hardware cost. This paper investigates other distance metrics in such neural networks in order to achieve more efficient hardware implementations in MENNs. We propose using content addressable memories (CAMs) to accelerate and simplify attentional operations. Our hardware friendly approach implements fixed point L∞ distance calculations via ternary content addressable memories (TCAM) and fixed point L1 and L2 distance calculations on a general purpose graphical processing unit (GPGPU). As a representative example, a 32-bit floating point-based cosine distance MENN with M • D multiplications has a 99.06% accuracy for the Omniglot 5-way 5-shot classification task. Based on our approach, with just 4-bit fixed point precision, a L∞- L1 distance hardware accuracy of 90.35% can be achieved with just 16 TCAM lookups and 16•D addition and subtraction operations. With 4-bit precision and a L∞-L2 distance, hardware classification accuracies of 96.00% are possible. Hence, 16 TCAM lookups and 16•D multiplication operations are needed. Assuming the hardware memory has 512 entries, the number of multiplication operations is reduced by 32x versus the cosine distance approach.

[1]  Alexander G. Anderson,et al.  The High-Dimensional Geometry of Binary Neural Networks , 2017, ICLR.

[2]  Timothy Sherwood,et al.  Modeling TCAM power for next generation network devices , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[3]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[4]  Meng-Fan Chang,et al.  7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in wordlength-energyefficiency-density product using 2.5T1R cell , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[5]  Anat Bremler-Barr,et al.  Encoding Short Ranges in TCAM Without Expansion: Efficient Algorithm and Applications , 2018, IEEE/ACM Transactions on Networking.