Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer

Embedding learning for categorical features is crucial for the deep learning-based recommendation models (DLRMs). Each feature value is mapped to an embedding vector via an embedding learning process. Conventional methods configure a fixed and uniform embedding size to all feature values from the same feature field. However, such a configuration is not only sub-optimal for embedding learning but also memory costly. Existing methods that attempt to resolve these problems, either rule-based or neural architecture search (NAS)-based, need extensive efforts on the human design or network training. They are also not flexible in embedding size selection or in warm-start-based applications. In this paper, we propose a novel and effective embedding size selection scheme. Specifically, we design an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer. AMTL generates a mask vector to mask the undesired dimensions for each embedding vector. The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs. Extensive experimental evaluations show that the proposed scheme outperforms competitive baselines on all the benchmark tasks, and is also memory-efficient, saving 60% memory usage without compromising any performance metrics.

[1]  Jiliang Tang,et al.  AutoEmb: Automated Embedding Dimensionality Search in Streaming Recommendations , 2020, ArXiv.

[2]  James Zou,et al.  Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems , 2019, 2021 IEEE International Symposium on Information Theory (ISIT).

[3]  Alykhan Tejani,et al.  Model Size Reduction Using Frequency Based Double Hashing for Recommender Systems , 2020, RecSys.

[4]  Jiliang Tang,et al.  Automated Embedding Size Search in Deep Recommender Systems , 2020, SIGIR.

[5]  Jiliang Tang,et al.  Memory-efficient Embedding for Recommendations , 2020, ArXiv.

[6]  Jiyan Yang,et al.  Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems , 2019, KDD.

[7]  Quoc V. Le,et al.  Neural Input Search for Large Scale Recommendation Models , 2019, KDD.

[8]  Ming Zhang,et al.  AutoInt , 2019, Proceedings of the 28th ACM International Conference on Information and Knowledge Management.

[9]  Yi Ren,et al.  Graph Intention Network for Click-through Rate Prediction in Sponsored Search , 2019, SIGIR.

[10]  Jian Tang,et al.  AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks , 2018, CIKM.

[11]  Xing Xie,et al.  xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems , 2018, Knowledge Discovery and Data Mining.

[12]  Xing Xie,et al.  xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems , 2018, KDD.

[13]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[14]  Gang Fu,et al.  Deep & Cross Network for Ad Click Predictions , 2017, ADKDD@KDD.

[15]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[16]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[17]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[18]  Chih-Jen Lin,et al.  Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[19]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[20]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[21]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[22]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..