Cognitive SSD: A Deep Learning Engine for In-Storage Data Retrieval

Data analysis and retrieval is a widely-used component in existing artificial intelligence systems. However, each request has to go through each layer across the I/O stack, which moves tremendous irrelevant data between secondary storage, DRAM, and the on-chip cache. This leads to high response latency and rising energy consumption. To address this issue, we propose Cognitive SSD, an energy-efficient engine for deep learning based unstructured data retrieval. In Cognitive SSD, a flash-accessing accelerator named DLG-x is placed by the side of flash memory to achieve near-data deep learning and graph search. Such functions of in-SSD deep learning and graph search are exposed to the users as library APIs via NVMe command extension. Experimental results on the FPGA-based prototype reveal that the proposed Cognitive SSD reduces latency by 69.9% on average in comparison with CPU based solutions on conventional SSDs, and it reduces the overall system power consumption by up to 34.4% and 63.0% respectively when compared to CPU and GPU based solutions that deliver comparable performance.

[1]  Javier González,et al.  LightNVM: The Linux Open-Channel SSD Subsystem , 2017, FAST.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Wu-Jun Li,et al.  Feature Learning Based Deep Supervised Hashing with Pairwise Labels , 2015, IJCAI.

[4]  Mahmut T. Kandemir,et al.  FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs , 2018, OSDI.

[5]  Onur Mutlu,et al.  Errors in Flash-Memory-Based Solid-State Drives: Analysis, Mitigation, and Recovery , 2017, ArXiv.

[6]  Peter Desnoyers,et al.  Reducing Data Movement Costs Using Energy-Efficient, Active Computation on SSD , 2012, HotPower.

[7]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[8]  Chanik Park,et al.  Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Chanik Park,et al.  Active disk meets flash: a case for intelligent SSDs , 2013, ICS '13.

[10]  Steven Swanson,et al.  Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.

[11]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Shiguang Shan,et al.  Deep Supervised Hashing for Fast Image Retrieval , 2016, International Journal of Computer Vision.

[13]  Sungjin Lee,et al.  BlueDBM: An appliance for Big Data analytics , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[14]  Chu-Song Chen,et al.  Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Youyou Lu,et al.  Extending the lifetime of flash-based storage through reducing write amplification from file systems , 2013, FAST.

[16]  Rajesh Gupta,et al.  Minerva: Accelerating Data Analysis in Next-Generation SSDs , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[17]  Yong Wang,et al.  Active SSD design for energy-efficiency improvement of web-scale data analysis , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[18]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[19]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[20]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  Deng Cai,et al.  EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph , 2016, ArXiv.

[24]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[25]  Deng Cai,et al.  Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph , 2017, Proc. VLDB Endow..

[26]  Gustavo Alonso,et al.  Ibex - An Intelligent Storage Engine with Support for Advanced SQL Off-loading , 2014, Proc. VLDB Endow..

[27]  Deng Cai,et al.  Fast Approximate Nearest Neighbor Search With Navigating Spreading-out Graphs , 2017, ArXiv.

[28]  Peter Desnoyers,et al.  Active Flash: Out-of-core data analytics on flash storage , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[29]  Sean Eilert,et al.  DataCenter 2020: Near-memory acceleration for data-oriented applications , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[30]  Yang Liu,et al.  Willow: A User-Programmable SSD , 2014, OSDI.

[31]  Youyou Lu,et al.  ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices , 2016, USENIX Annual Technical Conference.

[32]  Peter Desnoyers,et al.  Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines , 2013, FAST.

[33]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[34]  Xuemin Lin,et al.  Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement , 2016, IEEE Transactions on Knowledge and Data Engineering.

[35]  Yannis Papakonstantinou,et al.  SSD in-storage computing for list intersection , 2016, DaMoN '16.

[36]  Yong Wang,et al.  SDF: software-defined flash for web-scale internet storage systems , 2014, ASPLOS.

[37]  Tao Li,et al.  Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX Annual Technical Conference.

[38]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[39]  Steven Swanson,et al.  Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[40]  Xiaowei Li,et al.  C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[41]  Hyeonsang Eom,et al.  A User-Level File System for Fast Storage Devices , 2014, ICCAC.

[42]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[43]  Sungroh Yoon,et al.  Near-Data Processing for Machine Learning , 2016, ArXiv.

[44]  Jie Xu,et al.  DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[45]  Krista A. Ehinger,et al.  SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.

[46]  Sizhuo Zhang,et al.  GraFBoost: Using Accelerated Flash Storage for External Graph Analytics , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[47]  Jen-Hao Hsiao,et al.  Deep learning of binary hash codes for fast image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Deng Cai A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search , 2016, ArXiv.