DUAL: Acceleration of Clustering Algorithms using Digital-based Processing In-Memory

Today’s applications generate a large amount of data that need to be processed by learning algorithms. In practice, the majority of the data are not associated with any labels. Unsupervised learning, i.e., clustering methods, are the most commonly used algorithms for data analysis. However, running clustering algorithms on traditional cores results in high energy consumption and slow processing speed due to a large amount of data movement between memory and processing units. In this paper, we propose DUAL, a Digital-based Unsupervised learning AcceLeration, which supports a wide range of popular algorithms on conventional crossbar memory. Instead of working with the original data, DUAL maps all data points into high-dimensional space, replacing complex clustering operations with memory-friendly operations. We accordingly design a PIM-based architecture that supports all essential operations in a highly parallel and scalable way. DUAL supports a wide range of essential operations and enables in-place computations, allowing data points to remain in memory. We have evaluated DUAL on several popular clustering algorithms for a wide range of large-scale datasets. Our evaluation shows that DUAL provides a comparable quality to existing clustering algorithms while using a binary representation and a simplified distance metric. DUAL also provides 58.8× speedup and 251.2× energy efficiency improvement as compared to the state-of-the-art solution running on GPU.

[1]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[2]  Tajana Simunic,et al.  SemiHD: Semi-Supervised Learning Using Hyperdimensional Computing , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[3]  Mohsen Imani,et al.  NVQuery: Efficient Query Processing in Nonvolatile Memory , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Shahar Kvatinsky,et al.  Efficient Algorithms for In-Memory Fixed Point Multiplication Using MAGIC , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[5]  Earl E. Swartzlander,et al.  Memristor-Based Computing , 2018, IEEE Micro.

[6]  Debjyoti Bhattacharjee,et al.  SIMPLER MAGIC: Synthesis and Mapping of In-Memory Logic Executed in a Single Row to Improve Throughput , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Eby G. Friedman,et al.  VTEAM – A General Model for Voltage Controlled Memristors , 2014 .

[8]  Yuan Xie,et al.  DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Tajana Simunic,et al.  GAS: A Heterogeneous Memory Architecture for Graph Processing , 2018, ISLPED.

[10]  Ismail Oukid,et al.  Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems , 2017, Proc. VLDB Endow..

[11]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[12]  Nishil Talati,et al.  Logic Design Within Memristive Memories Using Memristor-Aided loGIC (MAGIC) , 2016, IEEE Transactions on Nanotechnology.

[13]  Shaahin Angizi,et al.  DIMA: A Depthwise CNN In-Memory Accelerator , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[14]  Uri C. Weiser,et al.  MAGIC—Memristor-Aided Logic , 2014, IEEE Transactions on Circuits and Systems II: Express Briefs.

[15]  Mehdi Kamal,et al.  TruncApp: A truncation-based approximate divider for energy efficient DSP applications , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[16]  Peilin Song,et al.  1Mb 0.41 µm2 2T-2R cell nonvolatile TCAM with two-bit encoding and clocked self-referenced sensing , 2013, 2013 Symposium on VLSI Circuits.

[17]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[18]  Sawsan Kanj,et al.  Shared Nearest Neighbor clustering in a Locality Sensitive Hashing framework , 2016, bioRxiv.

[19]  Gregory S. Snider,et al.  ‘Memristive’ switches enable ‘stateful’ logic operations via material implication , 2010, Nature.

[20]  Hisashi Koga,et al.  Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing , 2007, Knowledge and Information Systems.

[21]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[22]  MutluOnur,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015 .

[23]  David Blaauw,et al.  Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[24]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[25]  Tajana Simunic,et al.  DigitalPIM: Digital-based Processing In-Memory for Big Data Acceleration , 2019, ACM Great Lakes Symposium on VLSI.

[26]  Clark F. Olson,et al.  Parallel Algorithms for Hierarchical Clustering , 1995, Parallel Comput..

[27]  Jian Sun,et al.  K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Swaroop Ghosh,et al.  Emerging Trends in Design and Applications of Memory-Based Computing and Content-Addressable Memories , 2015, Proceedings of the IEEE.

[29]  Yiran Chen,et al.  GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[30]  Engin Ipek,et al.  The Memristive Boltzmann Machines , 2017, IEEE Micro.

[31]  Farinaz Koushanfar,et al.  RAPIDNN: In-Memory Deep Neural Network Acceleration Framework , 2018, ArXiv.

[32]  John Paul Strachan,et al.  Analog content-addressable memories with memristors , 2019, Nature Communications.

[33]  Davide Anguita,et al.  Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine , 2012, IWAAL.

[34]  Tajana Simunic,et al.  GRAM: graph processing in a ReRAM-based computational memory , 2019, ASP-DAC.

[35]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[36]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[37]  Mohsen Imani,et al.  Deep Learning Acceleration with Neuron-to-Memory Transformation , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[38]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.

[39]  Tajana Simunic,et al.  FELIX: Fast and Energy-Efficient Logic in Memory , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[40]  Anne Siemon,et al.  A Complementary Resistive Switch-Based Crossbar Array Adder , 2015, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[41]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[42]  Mohammad Arjomand,et al.  Re-NUCA: A Practical NUCA Architecture for ReRAM Based Last-Level Caches , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[43]  Meichun Hsu,et al.  Clustering billions of data points using GPUs , 2009, UCHPC-MAW '09.

[44]  Shaahin Angizi,et al.  ParaPIM: a parallel processing-in-memory accelerator for binary-weight deep neural networks , 2019, ASP-DAC.

[45]  Claire Mathieu,et al.  Hierarchical Clustering , 2017, SODA.

[46]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[47]  Scott A. Mahlke,et al.  In-Memory Data Parallel Processor , 2018, ASPLOS.

[48]  Tao Zhang,et al.  Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[49]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[50]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[51]  Rajeev Balasubramonian,et al.  Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration , 2018, IEEE Micro.

[52]  Shankar Vembu,et al.  Chemical gas sensor drift compensation using classifier ensembles , 2012 .

[53]  Engin Ipek,et al.  Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017, 2017 Fifth Berkeley Symposium on Energy Efficient Electronic Systems & Steep Transistors Workshop (E3S).

[54]  Odilia Yim,et al.  Hierarchical Cluster Analysis: Comparison of Three Linkage Measures and Application to Psychological Data , 2015 .

[55]  Michael Niemier,et al.  SearcHD: A Memory-Centric Hyperdimensional Computing With Stochastic Training , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[56]  Onur Mutlu,et al.  Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[57]  Tajana Simunic,et al.  FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[58]  Yambem Jina Chanu,et al.  Image Segmentation Using K -means Clustering Algorithm and Subtractive Clustering Algorithm , 2015 .

[59]  D.K. Bhattacharyya,et al.  An improved sampling-based DBSCAN for large spatial databases , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.

[60]  John Hart,et al.  GPU Acceleration of Iterative Clustering , 2004 .

[61]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[62]  Uri C. Weiser,et al.  Memristor-Based Material Implication (IMPLY) Logic: Design Principles and Methodologies , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[63]  Pentti Kanerva,et al.  Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors , 2009, Cognitive Computation.

[64]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[65]  Hans-Hermann Bock,et al.  Classification and Related Methods of Data Analysis , 1988 .

[66]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[67]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[68]  Rafael Sachetto Oliveira,et al.  G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering , 2013, ICCS.

[69]  Tajana Simunic,et al.  ORCHARD: Visual object recognition accelerator based on approximate in-memory processing , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[70]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[71]  S. Pandit,et al.  A Comparative Study on Distance Measuring Approaches for Clustering , 2011 .

[72]  Jan M. Rabaey,et al.  A Robust and Energy-Efficient Classifier Using Brain-Inspired Hyperdimensional Computing , 2016, ISLPED.

[73]  Jan M. Rabaey,et al.  Exploring Hyperdimensional Associative Memory , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[74]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[75]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[76]  J Joshua Yang,et al.  Memristive devices for computing. , 2013, Nature nanotechnology.

[77]  Jing Li,et al.  1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing , 2014, IEEE Journal of Solid-State Circuits.

[78]  V. Batagelj Generalized Ward and Related Clustering Problems ∗ , 1988 .

[79]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[80]  Horácio C. Neto,et al.  Multi-core for K-means clustering on FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[81]  Héctor Allende,et al.  Hashing-based clustering in high dimensional data , 2016, Expert Syst. Appl..

[82]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  K Lehnertz,et al.  Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[84]  Tajana Rosing,et al.  NNPIM: A Processing In-Memory Architecture for Neural Network Acceleration , 2019, IEEE Transactions on Computers.

[85]  Jing Li,et al.  Liquid Silicon: A Nonvolatile Fully Programmable Processing-in-Memory Processor With Monolithically Integrated ReRAM , 2020, IEEE Journal of Solid-State Circuits.

[86]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[87]  Tajana Simunic,et al.  Efficient query processing in crossbar memory , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[88]  Edward J. Coyle,et al.  An energy efficient hierarchical clustering algorithm for wireless sensor networks , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[89]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[90]  Sarajane Marques Peres,et al.  Gesture unit segmentation using support vector machines: segmenting gestures from rest positions , 2013, SAC '13.

[91]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[92]  Lei Jiang,et al.  FindeR: Accelerating FM-Index-Based Exact Pattern Matching in Genomic Sequences through ReRAM Technology , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[93]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[94]  George A. Constantinides,et al.  FPGA-based K-means clustering using tree-based data structures , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[95]  Shaahin Angizi,et al.  PIMA-Logic: A Novel Processing-in-Memory Architecture for Highly Flexible and Energy-Efficient Logic Computation , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[96]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[97]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[98]  Miriam Leeser,et al.  Accelerating K-Means clustering with parallel implementations and GPU computing , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[99]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[100]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[101]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[102]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[103]  Weiwei Liu,et al.  Compressed K-Means for Large-Scale Clustering , 2017, AAAI.

[104]  Kiyoung Choi,et al.  PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).