DreamShard: Generalizable Embedding Table Placement for Recommender Systems

We study embedding table placement for distributed recommender systems, which aims to partition and place the tables on multiple hardware devices (e.g., GPUs) to balance the computation and communication costs. Although prior work has explored learning-based approaches for the device placement of computational graphs, embedding table placement remains to be a challenging problem because of 1) the operation fusion of embedding tables, and 2) the generalizability requirement on unseen placement tasks with different numbers of tables and/or devices. To this end, we present DreamShard, a reinforcement learning (RL) approach for embedding table placement. DreamShard achieves the reasoning of operation fusion and generalizability with 1) a cost network to directly predict the costs of the fused operation, and 2) a policy network that is efficiently trained on an estimated Markov decision process (MDP) without real GPU execution, where the states and the rewards are estimated with the cost network. Equipped with sum and max representation reductions, the two networks can directly generalize to any unseen tasks with different numbers of tables and/or devices without fine-tuning. Extensive experiments show that DreamShard substantially outperforms the existing human expert and RNN-based strategies with up to 19% speedup over the strongest baseline on large-scale synthetic tables and our production tables. The code is available at https://github.com/daochenzha/dreamshard

[1]  D. Zha,et al.  Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning , 2022, CIKM.

[2]  Yi-An Ma,et al.  AutoShard: Automated Embedding Table Sharding for Recommender Systems , 2022, KDD.

[3]  U. Braga-Neto,et al.  Auto-PINN: Understanding and Optimizing Physics-Informed Neural Architecture , 2022, ArXiv.

[4]  Christos Kozyrakis,et al.  RecShard: statistical feature-based memory optimization for industry-scale neural recommendation , 2022, ASPLOS.

[5]  Juliana Freire,et al.  AlphaD3M: Machine Learning Pipeline Synthesis , 2021, ArXiv.

[6]  D. Zha,et al.  Automated Anomaly Detection via Curiosity-Guided Search and Self-Imitation Learning , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Zaid Pervaiz Bhat,et al.  AutoVideo: An Automated Video Action Recognition System , 2021, IJCAI.

[8]  Xiangru Lian,et al.  DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning , 2021, ICML.

[9]  Xia Hu,et al.  Simplifying Deep Reinforcement Learning via Self-Supervision , 2021, ArXiv.

[10]  Haifeng Jin,et al.  AutoOD: Neural Architecture Search for Outlier Detection , 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE).

[11]  Xia Hu,et al.  Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments , 2021, ICLR.

[12]  Depeng Jin,et al.  Learnable Embedding Sizes for Recommender Systems , 2021, ICLR.

[13]  Mikhail Smelyanskiy,et al.  FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference , 2021, ArXiv.

[14]  Carole-Jean Wu,et al.  Understanding Training Efficiency of Deep Learning Recommendation Models at Scale , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[15]  Carole-Jean Wu,et al.  Understanding Capacity-Driven Scale-Out Neural Recommendation Inference , 2020, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[16]  Ed H. Chi,et al.  Learning to Embed Categorical Features without Embedding Tables for Recommendation , 2020, KDD.

[17]  CohenAlbert,et al.  DNNFusion: accelerating deep neural networks execution with advanced operator fusion , 2020, ACM Trans. Archit. Code Optim..

[18]  Diego Martinez,et al.  TODS: An Automated Time Series Outlier Detection System , 2020, AAAI.

[19]  Xia Hu,et al.  Meta-AAD: Active Anomaly Detection with Deep Reinforcement Learning , 2020, 2020 IEEE International Conference on Data Mining (ICDM).

[20]  Alykhan Tejani,et al.  Model Size Reduction Using Frequency Based Double Hashing for Recommender Systems , 2020, RecSys.

[21]  Xia Hu,et al.  RLCard: A Platform for Reinforcement Learning in Card Games , 2020, IJCAI.

[22]  Nikhil R. Devanur,et al.  Efficient Algorithms for Device Placement of DNN Graph Operators , 2020, NeurIPS.

[23]  Bor-Yiing Su,et al.  Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems , 2020, ArXiv.

[24]  Azalia Mirhoseini,et al.  Placement Optimization with Deep Reinforcement Learning , 2020, ISPD.

[25]  Ping Li,et al.  Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems , 2020, MLSys.

[26]  Jiliang Tang,et al.  AutoEmb: Automated Embedding Dimensionality Search in Streaming Recommendations , 2020, ArXiv.

[27]  Dong Lin,et al.  Learning Multi-granular Quantized Embeddings for Large-Vocab Categorical Features in Recommender Systems , 2020, WWW.

[28]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[29]  Jingyuan Zhang,et al.  AIBox: CTR Prediction Model Training on a Single Node , 2019, CIKM.

[30]  Nikhil R. Devanur,et al.  PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.

[31]  Azalia Mirhoseini,et al.  GDP: Generalized Device Placement for Dataflow Graphs , 2019, ArXiv.

[32]  Jakob N. Foerster,et al.  Exploratory Combinatorial Optimization with Reinforcement Learning , 2019, AAAI.

[33]  Jiyan Yang,et al.  Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems , 2019, KDD.

[34]  Quoc V. Le,et al.  Neural Input Search for Large Scale Recommendation Models , 2019, KDD.

[35]  Hongzi Mao,et al.  Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning , 2019, NeurIPS.

[36]  Carole-Jean Wu,et al.  The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[37]  Yinghai Lu,et al.  Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.

[38]  Vinod Nair,et al.  Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs , 2019, ICLR.

[39]  Jian Tang,et al.  AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks , 2018, CIKM.

[40]  Chang Zhou,et al.  Deep Interest Evolution Network for Click-Through Rate Prediction , 2018, AAAI.

[41]  Alexander Aiken,et al.  Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.

[42]  Baochun Li,et al.  Spotlight: Optimizing Device Placement for Training Deep Neural Networks , 2018, ICML.

[43]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[44]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[45]  Dik Lun Lee,et al.  Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba , 2018, KDD.

[46]  Quoc V. Le,et al.  A Hierarchical Model for Device Placement , 2018, ICLR.

[47]  Charles R. Qi,et al.  Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks , 2018, ICML.

[48]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[49]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[50]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[51]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[52]  Samy Bengio,et al.  Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[53]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[54]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[55]  Stephanie Rogers,et al.  Related Pins at Pinterest: The Evolution of a Real-World Recommender System , 2017, WWW.

[56]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[57]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[58]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[59]  Nicholas Jing Yuan,et al.  Collaborative Knowledge Base Embedding for Recommender Systems , 2016, KDD.

[60]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[61]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[62]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[63]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[64]  L. Akoglu,et al.  Towards Unsupervised HPO for Outlier Detection , 2022, ArXiv.

[65]  Ji Liu,et al.  Persia: A Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters , 2021 .

[66]  Yue Zhao Automatic Unsupervised Outlier Model Selection , 2021, NeurIPS.

[67]  Baochun Li,et al.  Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization , 2018, NeurIPS.

[68]  Ahg Alexander Rinnooy Kan,et al.  Sequencing and scheduling: algorithms and complexity , 1989 .