Tree-based machine learning performed in-memory with memristive analog CAM

Tree-based machine learning techniques, such as Decision Trees and Random Forests, are top performers in several domains as they do well with limited training datasets and offer improved interpretability compared to Deep Neural Networks (DNN). However, these models are difficult to optimize for fast inference at scale without accuracy loss in von Neumann architectures due to non-uniform memory access patterns. Recently, we proposed a novel analog content addressable memory (CAM) based on emerging memristor devices for fast look-up table operations. Here, we propose for the first time to use the analog CAM as an in-memory computational primitive to accelerate tree-based model inference. We demonstrate an efficient mapping algorithm leveraging the new analog CAM capabilities such that each root to leaf path of a Decision Tree is programmed into a row. This new in-memory compute concept for enables few-cycle model inference, dramatically increasing 103 × the throughput over conventional approaches.

[1]  Sujan Kumar Gonugondla,et al.  A 19.4-nJ/Decision, 364-K Decisions/s, In-Memory Random Forest Multi-Class Inference Accelerator , 2018, IEEE Journal of Solid-State Circuits.

[2]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[3]  Chunhua Wang,et al.  Machine Learning and Deep Learning Methods for Cybersecurity , 2018, IEEE Access.

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  Meng-Fan Chang,et al.  ReRAM-based 4T2R nonvolatile TCAM with 7x NVM-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[6]  Wei Lu,et al.  The future of electronics based on memristive systems , 2018, Nature Electronics.

[7]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[8]  John Paul Strachan,et al.  Low‐Conductance and Multilevel CMOS‐Integrated Nanoscale Oxide Memristors , 2019, Advanced Electronic Materials.

[9]  Meng-Fan Chang,et al.  7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in wordlength-energyefficiency-density product using 2.5T1R cell , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[10]  Indranil Roy,et al.  Towards Machine Learning on the Automata Processor , 2016, ISC.

[11]  John Paul Strachan,et al.  Analog content-addressable memories with memristors , 2019, Nature Communications.

[12]  Maya Gokhale,et al.  Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA? , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[13]  Scott M. Lundberg,et al.  Explainable machine-learning predictions for the prevention of hypoxaemia during surgery , 2018, Nature Biomedical Engineering.

[14]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[15]  Catherine E. Graves,et al.  In‐Memory Computing with Memristor Content Addressable Memories for Pattern Matching , 2020, Advanced materials.

[16]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[17]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[18]  Engin Ipek,et al.  A resistive TCAM accelerator for data-intensive computing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  K. Pagiamtzis,et al.  Content-addressable memory (CAM) circuits and architectures: a tutorial and survey , 2006, IEEE Journal of Solid-State Circuits.

[20]  Wei D. Lu,et al.  Sparse coding with memristor networks. , 2017, Nature nanotechnology.

[21]  Daniele Ielmini,et al.  Device and Circuit Architectures for In‐Memory Computing , 2020, Adv. Intell. Syst..

[22]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[23]  Heiner Giefers,et al.  Mixed-precision in-memory computing , 2017, Nature Electronics.

[24]  Qing Wu,et al.  Efficient and self-adaptive in-situ learning in multilayer memristor neural networks , 2018, Nature Communications.

[25]  Eby G. Friedman,et al.  AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.

[26]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[27]  Hoi-Jun Yoo,et al.  A Vocabulary Forest Object Matching Processor With 2.07 M-Vector/s Throughput and 13.3 nJ/Vector Per-Vector Energy for Full-HD 60 fps Video Object Recognition , 2015, IEEE Journal of Solid-State Circuits.

[28]  John von Neumann,et al.  First draft of a report on the EDVAC , 1993, IEEE Annals of the History of Computing.

[29]  Jimmy J. Lin,et al.  Runtime Optimizations for Tree-Based Machine Learning Models , 2014, IEEE Transactions on Knowledge and Data Engineering.

[30]  Bing Chen,et al.  A general memristor-based partial differential equation solver , 2018, Nature Electronics.

[31]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[32]  Liang-Gee Chen,et al.  Visual Vocabulary Processor Based on Binary Tree Architecture for Real-Time Object Recognition in Full-HD Resolution , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[33]  Swaroop Ghosh,et al.  Emerging Trends in Design and Applications of Memory-Based Computing and Content-Addressable Memories , 2015, Proceedings of the IEEE.

[34]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[35]  Luc Van Gool,et al.  Integrating Object Detection with 3D Tracking Towards a Better Driver Assistance System , 2010, 2010 20th International Conference on Pattern Recognition.

[36]  Daniele Ielmini,et al.  Resistive switching memories based on metal oxides: mechanisms, reliability and scaling , 2016 .

[37]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[38]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[39]  John Paul Strachan,et al.  CMOS-integrated nanoscale memristive crossbars for CNN and optimization acceleration , 2020, 2020 IEEE International Memory Workshop (IMW).

[40]  Daniele Ielmini,et al.  Solving matrix equations in one step with cross-point resistive arrays , 2019, Proceedings of the National Academy of Sciences.

[41]  L. Mombaerts,et al.  An interpretable mortality prediction model for COVID-19 patients , 2020, Nature Machine Intelligence.

[42]  H.-S. Philip Wong,et al.  In-memory computing with resistive switching devices , 2018, Nature Electronics.

[43]  Siddharth Joshi,et al.  Author Correction: Ferroelectric ternary content-addressable memory for one-shot learning , 2019, Nature Electronics.

[44]  Gian Antonio Susto,et al.  Machine Learning for Predictive Maintenance: A Multiple Classifier Approach , 2015, IEEE Transactions on Industrial Informatics.

[45]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[46]  Jiaming Zhang,et al.  Analogue signal and image processing with large memristor crossbars , 2017, Nature Electronics.

[47]  Bin Gao,et al.  Fully hardware-implemented memristor convolutional neural network , 2020, Nature.