Reducing tail latency of DNN-based recommender systems using in-storage processing

Most recommender systems are designed to comply with service level agreement (SLA) because prompt response to users' requests is the most important factor that decides the quality of service. Existing recommender systems, however, seriously suffer from long tail latency when the embedding tables cannot be entirely loaded in the main memory. In this paper, we propose a new SSD architecture called EMB-SSD, which mitigates the tail latency problem of recommender systems by leveraging in-storage processing. By offloading the data-intensive parts of the recommendation algorithm into an SSD, EMB-SSD not only reduces the data traffic between the host and the SSD, but also lowers software overheads caused by deep I/O stacks. Results show that EMB-SSD exhibits 47% and 25% shorter 99th percentile latency and average latency, respectively, over existing systems.

[1]  Hyeonsang Eom,et al.  A User-Level File System for Fast Storage Devices , 2014, ICCAC.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Sachin Katti,et al.  Bandana: Using Non-volatile Memory for Storing Deep Learning Models , 2018, MLSys.

[4]  Minsub Kim,et al.  Towards Scalable Analytics with Inference-Enabled Solid-State Drives , 2020, IEEE Computer Architecture Letters.

[5]  Yinghai Lu,et al.  Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.

[6]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[7]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[8]  Ying Wang,et al.  Cognitive SSD: A Deep Learning Engine for In-Storage Data Retrieval , 2019, USENIX Annual Technical Conference.

[9]  Sungjin Lee,et al.  Performance Analysis of NVMe SSD-Based All-flash Array Systems , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[10]  Carole-Jean Wu,et al.  The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  Sam H. Noh,et al.  Managing Array of SSDs When the Storage Device Is No Longer the Performance Bottleneck , 2017, HotStorage.

[12]  Minsoo Rhu,et al.  TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.