NCAM: Near-Data Processing for Nearest Neighbor Search

Emerging classes of computer vision applications demand unprecedented computational resources and operate on large amounts of data. In particular, k-nearest neighbors (kNN), a cornerstone algorithm in these applications, incurs significant data movement. To address this challenge, the underlying architecture and memory subsystems must vertically evolve to address memory bandwidth and compute demands. To enable large-scale computer vision, we propose a new class of associative memories called NCAMs which encapsulate logic with memory to accelerate k-nearest neighbors. We estimate that NCAMs can improve the performance of kNN by orders of magnitude over the best off-the-shelf software libraries (e.g., FLANN) and commodity platforms (e.g., GPUs).

[1]  Ashish Goel,et al.  Similarity search and locality sensitive hashing using ternary content addressable memories , 2010, SIGMOD Conference.

[2]  Dave Brown,et al.  Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing , 2013 .

[3]  Steven Swanson,et al.  Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.

[4]  Steven M. LaValle,et al.  Improving Motion-Planning Algorithms by Efficient Nearest-Neighbor Searching , 2007, IEEE Transactions on Robotics.

[5]  Leif Azzopardi,et al.  How many results per page?: A Study of SERP Size, Search Behavior and User Experience , 2015, SIGIR.

[6]  Onur Mutlu,et al.  Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[7]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[8]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[11]  Duncan G. Elliott,et al.  Computational RAM: Implementing Processors in Memory , 1999, IEEE Des. Test Comput..

[12]  Jiwen Lu,et al.  Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[15]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[16]  Jian Sun,et al.  K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[18]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[19]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[20]  Wei Zhao Predictive technology modeling for scaled CMOS , 2009 .

[21]  Yu Wang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[22]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[23]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[24]  Nicu Sebe,et al.  Similarity Matching in Computer Vision and Multimedia , 2008, Comput. Vis. Image Underst..

[25]  Peter M. Kogge,et al.  Combined DRAM and logic chip for massively parallel systems , 1995, Proceedings Sixteenth Conference on Advanced Research in VLSI.

[26]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[27]  Eby G. Friedman,et al.  AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.

[28]  Ronald G. Dreslinski,et al.  Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.

[29]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[30]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[31]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[33]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[34]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.

[35]  Michael F. Deering,et al.  FBRAM: a new form of memory optimized for 3D graphics , 1994, SIGGRAPH.

[36]  Gabriel H. Loh,et al.  Thermal analysis of a 3D die-stacked high-performance microprocessor , 2006, GLSVLSI '06.

[37]  Kang G. Shin,et al.  Scalable Hardware Priority Queue Architectures for High-Speed Packet Switches , 2000, IEEE Trans. Computers.

[38]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[39]  Jitendra Malik,et al.  Matching Shapes , 2001, ICCV.

[40]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[41]  Jinyoung Lee,et al.  Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[42]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[43]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[44]  B C Durga prasad,et al.  Synthesis of a TI MSP430 microcontroller core using Multi-Voltage methodology , 2010, 2010 INTERNATIONAL CONFERENCE ON COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES.

[45]  Quan Chen,et al.  DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[46]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[47]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.

[48]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[49]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[50]  James D Roberts PROXIMITY CONTENT-ADDRESSABLE MEMORY:AN EFFICIENT EXTENSION TO k-NEAREST NEIGHBORS SEARCH (M.S. Thesis) , 1990 .

[51]  Yacov Hel-Or,et al.  Ultra-Fast Similarity Search Using Ternary Content Addressable Memory , 2015, DaMoN.

[52]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[53]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[55]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[56]  B. S. Manjunath,et al.  Content-based search of video using color, texture, and motion , 1997, Proceedings of International Conference on Image Processing.

[57]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Jing Wang,et al.  Processing-in-Memory Enabled Graphics Processors for 3D Rendering , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[59]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Jason Weston,et al.  #TagSpace: Semantic Embeddings from Hashtags , 2014, EMNLP.

[61]  Tejas Karkhanis,et al.  Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..

[62]  Onur Mutlu,et al.  Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[63]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[64]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[65]  Engin Ipek,et al.  Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing , 2010, ISCA.

[66]  Chanik Park,et al.  Intelligent SSD: a turbo for big data mining , 2013, CIKM.

[67]  Shmuel Tomi Klein,et al.  The design of a similarity based deduplication system , 2009, SYSTOR '09.

[68]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[69]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[70]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[71]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[72]  Yanjun Qi,et al.  Association Rule Mining with the Micron Automata Processor , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[73]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[74]  Aaron Lipman,et al.  The Smart Access Memory: An Intelligent RAM for Nearest Neighbor Database Searching , 1997 .

[75]  Dmitri B. Strukov,et al.  Race Logic: A hardware acceleration for dynamic programming algorithms , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[76]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[77]  C.C. Chen,et al.  A highly manufacturable 28nm CMOS low power platform technology with fully functional 64Mb SRAM using dual/tripe gate oxide process , 2006, 2009 Symposium on VLSI Technology.

[78]  Frederic T. Chong,et al.  Active pages: a computation model for intelligent memory , 1998, ISCA.

[79]  J. Thomas Pawlowski,et al.  Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[80]  Seung-Moon Yoo,et al.  FlexRAM: Toward an advanced Intelligent Memory system , 1999, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[81]  Alvin R. Lebeck,et al.  Exploiting accelerators for efficient high dimensional similarity search , 2016, PPoPP.

[82]  I. Jolliffe Principal Component Analysis , 2002 .

[83]  Karin Strauss,et al.  What the Future Holds for Solid-State Memory , 2014, Computer.

[84]  Engin Ipek,et al.  A resistive TCAM accelerator for data-intensive computing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[85]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[86]  Regina Berretta,et al.  GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs , 2012, PloS one.

[87]  Erik Brunvand,et al.  Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[88]  Jianyu Huang,et al.  Performance optimization for the k-nearest neighbors kernel on x86 architectures , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[89]  Ronald Azuma,et al.  A Survey of Augmented Reality , 1997, Presence: Teleoperators & Virtual Environments.

[90]  Koji Nii,et al.  13.6 A 28nm 400MHz 4-parallel 1.6Gsearch/s 80Mb ternary CAM , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[91]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[93]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[94]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[95]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[96]  Ronald G. Dreslinski,et al.  Hardware acceleration for similarity measurement in natural language processing , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[97]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[98]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[99]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[100]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[101]  Pat Hanrahan,et al.  Photon mapping on programmable graphics hardware , 2003, HWWS '03.

[102]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[103]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..