论文信息 - FlashEmbedding: storing embedding tables in SSD for large-scale recommender systems

FlashEmbedding: storing embedding tables in SSD for large-scale recommender systems

We present FlashEmbedding, a hardware/software co-design solution for storing embedding tables on SSDs for large-scale recommendation inference under memory capacity-limited systems. FlashEmbedding leverages an embedding semantic-aware SSD, an embedding-oriented software cache, and pipeline techniques to improve the overall performance. We evaluate the performance of FlashEmbedding with our FPGA-based prototype SSD on a real-world public dataset. FlashEmbedding achieves up to 17.44× lower latency in embedding lookups and 2.89× lower end-to-end latency than baseline solution in a memory capacity-limted system.

[1] Minsub Kim,et al. Reducing tail latency of DNN-based recommender systems using in-storage processing , 2020, APSys.

[2] Martin D. Schatz,et al. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.

[3] Dik Lun Lee,et al. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba , 2018, KDD.

[4] Developing a Recommendation Benchmark for MLPerf Training and Inference , 2020, ArXiv.

[5] J. Ian Munro,et al. Robin hood hashing , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[6] Carole-Jean Wu,et al. RecSSD: near data processing for solid state drive based recommendation inference , 2021, ASPLOS.

[7] Jian Huang,et al. FlatFlash: Exploiting the Byte-Accessibility of SSDs within a Unified Memory-Storage Hierarchy , 2019, ASPLOS.

[8] Carole-Jean Wu,et al. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[9] Joo Young Hwang,et al. 2B-SSD: The Case for Dual, Byte- and Block-Addressable Solid-State Drives , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[10] Bor-Yiing Su,et al. Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems , 2020, ArXiv.

[11] Jason Cong,et al. INSIDER: Designing In-Storage Computing System for Emerging High-Performance Drive , 2019, USENIX Annual Technical Conference.

[12] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[13] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[14] Sachin Katti,et al. Bandana: Using Non-volatile Memory for Storing Deep Learning Models , 2018, MLSys.

[15] Carole-Jean Wu,et al. Cross-Stack Workload Characterization of Deep Recommendation Systems , 2020, 2020 IEEE International Symposium on Workload Characterization (IISWC).

[16] Minsoo Rhu,et al. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[17] Minsoo Rhu,et al. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.

[18] Hsu Cynthia,et al. 13.5 A 128Gb 1b/Cell 96-Word-Line-Layer 3D Flash Memory to Improve Random Read Latency with t PROG =75μs and t R =4μs , 2020 .

[19] Tae Jun Ham,et al. MERCI: efficient embedding reduction on commodity hardware via sub-query memoization , 2021, ASPLOS.

[20] Martin D. Schatz,et al. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).