论文信息 - Cognitive SSD: A Deep Learning Engine for In-Storage Data Retrieval

Cognitive SSD: A Deep Learning Engine for In-Storage Data Retrieval

Data analysis and retrieval is a widely-used component in existing artificial intelligence systems. However, each request has to go through each layer across the I/O stack, which moves tremendous irrelevant data between secondary storage, DRAM, and the on-chip cache. This leads to high response latency and rising energy consumption. To address this issue, we propose Cognitive SSD, an energy-efficient engine for deep learning based unstructured data retrieval. In Cognitive SSD, a flash-accessing accelerator named DLG-x is placed by the side of flash memory to achieve near-data deep learning and graph search. Such functions of in-SSD deep learning and graph search are exposed to the users as library APIs via NVMe command extension. Experimental results on the FPGA-based prototype reveal that the proposed Cognitive SSD reduces latency by 69.9% on average in comparison with CPU based solutions on conventional SSDs, and it reduces the overall system power consumption by up to 34.4% and 63.0% respectively when compared to CPU and GPU based solutions that deliver comparable performance.

[1] Javier González,et al. LightNVM: The Linux Open-Channel SSD Subsystem , 2017, FAST.

[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3] Wu-Jun Li,et al. Feature Learning Based Deep Supervised Hashing with Pairwise Labels , 2015, IJCAI.

[4] Mahmut T. Kandemir,et al. FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs , 2018, OSDI.

[5] Onur Mutlu,et al. Errors in Flash-Memory-Based Solid-State Drives: Analysis, Mitigation, and Recovery , 2017, ArXiv.

[6] Peter Desnoyers,et al. Reducing Data Movement Costs Using Energy-Efficient, Active Computation on SSD , 2012, HotPower.

[7] Svetlana Lazebnik,et al. Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[8] Chanik Park,et al. Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[9] Chanik Park,et al. Active disk meets flash: a case for intelligent SSDs , 2013, ICS '13.

[10] Steven Swanson,et al. Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.

[11] Jiwen Lu,et al. Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Shiguang Shan,et al. Deep Supervised Hashing for Fast Image Retrieval , 2016, International Journal of Computer Vision.

[13] Sungjin Lee,et al. BlueDBM: An appliance for Big Data analytics , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[14] Chu-Song Chen,et al. Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Youyou Lu,et al. Extending the lifetime of flash-based storage through reducing write amplification from file systems , 2013, FAST.

[16] Rajesh Gupta,et al. Minerva: Accelerating Data Analysis in Next-Generation SSDs , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[17] Yong Wang,et al. Active SSD design for energy-efficiency improvement of web-scale data analysis , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[18] David J. DeWitt,et al. Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[19] Rajesh K. Gupta,et al. Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[20] Qi Tian,et al. SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23] Deng Cai,et al. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph , 2016, ArXiv.

[24] Amar Phanishayee,et al. FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[25] Deng Cai,et al. Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph , 2017, Proc. VLDB Endow..

[26] Gustavo Alonso,et al. Ibex - An Intelligent Storage Engine with Support for Advanced SQL Off-loading , 2014, Proc. VLDB Endow..

[27] Deng Cai,et al. Fast Approximate Nearest Neighbor Search With Navigating Spreading-out Graphs , 2017, ArXiv.

[28] Peter Desnoyers,et al. Active Flash: Out-of-core data analytics on flash storage , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[29] Sean Eilert,et al. DataCenter 2020: Near-memory acceleration for data-oriented applications , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[30] Yang Liu,et al. Willow: A User-Programmable SSD , 2014, OSDI.

[31] Youyou Lu,et al. ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices , 2016, USENIX Annual Technical Conference.

[32] Peter Desnoyers,et al. Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines , 2013, FAST.

[33] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[34] Xuemin Lin,et al. Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement , 2016, IEEE Transactions on Knowledge and Data Engineering.

[35] Yannis Papakonstantinou,et al. SSD in-storage computing for list intersection , 2016, DaMoN '16.

[36] Yong Wang,et al. SDF: software-defined flash for web-scale internet storage systems , 2014, ASPLOS.

[37] Tao Li,et al. Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX Annual Technical Conference.

[38] Heng Tao Shen,et al. Hashing for Similarity Search: A Survey , 2014, ArXiv.

[39] Steven Swanson,et al. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[40] Xiaowei Li,et al. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[41] Hyeonsang Eom,et al. A User-Level File System for Fast Storage Devices , 2014, ICCAC.

[42] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[43] Sungroh Yoon,et al. Near-Data Processing for Machine Learning , 2016, ArXiv.

[44] Jie Xu,et al. DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[45] Krista A. Ehinger,et al. SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.

[46] Sizhuo Zhang,et al. GraFBoost: Using Accelerated Flash Storage for External Graph Analytics , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[47] Jen-Hao Hsiao,et al. Deep learning of binary hash codes for fast image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48] Deng Cai. A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search , 2016, ArXiv.