Sequence-Aware Factorization Machines for Temporal Predictive Analytics

In various web applications like targeted advertising and recommender systems, the available categorical features (e.g., product type) are often of great importance but sparse. As a widely adopted solution, models based on Factorization Machines (FMs) are capable of modelling high-order interactions among features for effective sparse predictive analytics. As the volume of web-scale data grows exponentially over time, sparse predictive analytics inevitably involves dynamic and sequential features. However, existing FM-based models assume no temporal orders in the data, and are unable to capture the sequential dependencies or patterns within the dynamic features, impeding the performance and adaptivity of these methods. Hence, in this paper, we propose a novel Sequence-Aware Factorization Machine (SeqFM) for temporal predictive analytics, which models feature interactions by fully investigating the effect of sequential dependencies. As static features (e.g., user gender) and dynamic features (e.g., user interacted items) express different semantics, we innovatively devise a multi-view self-attention scheme that separately models the effect of static features, dynamic features and the mutual interactions between static and dynamic features in three different views. In SeqFM, we further map the learned representations of feature interactions to the desired output with a shared residual network. To showcase the versatility and generalizability of SeqFM, we test SeqFM in three popular application scenarios for FM-based models, namely ranking, classification and regression tasks. Extensive experimental results on six large-scale datasets demonstrate the superior effectiveness and efficiency of SeqFM.

[1]  Chih-Jen Lin,et al.  Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[2]  Xing Xie,et al.  xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems , 2018, KDD.

[3]  Dong Yu,et al.  Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features , 2016, KDD.

[4]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[5]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[6]  David Lo,et al.  Predicting response in mobile advertising with hierarchical importance-aware factorization machine , 2014, WSDM.

[7]  Hao Wang,et al.  Online sales prediction via trend alignment-based multitask recurrent neural networks , 2019, Knowledge and Information Systems.

[8]  Alex Beutel,et al.  Recurrent Recommender Networks , 2017, WSDM.

[9]  Weiqing Wang,et al.  TPM: A Temporal Personalized Model for Spatial Item Recommendation , 2018, ACM Trans. Intell. Syst. Technol..

[10]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[11]  Tat-Seng Chua,et al.  Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks , 2017, IJCAI.

[12]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[13]  Yang Wang,et al.  SPTF: A Scalable Probabilistic Tensor Factorization Model for Semantic-Aware Behavior Prediction , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[14]  Jiawei Han,et al.  Bridging Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI Recommendation , 2017, KDD.

[15]  Chengqi Zhang,et al.  Modeling Location-Based User Rating Profiles for Personalized Recommendation , 2015, ACM Trans. Knowl. Discov. Data.

[16]  Guangzhong Sun,et al.  Practical Lessons for Job Recommendations in the Cold-Start Scenario , 2017, RecSys 2017.

[17]  Guokun Lai,et al.  Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , 2017, SIGIR.

[18]  Brian D. Davison,et al.  Co-factorization machines: modeling user interests and predicting individual decisions in Twitter , 2013, WSDM.

[19]  Naonori Ueda,et al.  Higher-Order Factorization Machines , 2016, NIPS.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Iván Cantador,et al.  Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols , 2013, User Modeling and User-Adapted Interaction.

[22]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[23]  Ling Chen,et al.  SPORE: A sequential personalized spatial item recommender system , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[24]  Rui Yan,et al.  AIR: Attentional Intention-Aware Recommender Systems , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[25]  Tat-Seng Chua,et al.  Neural Factorization Machines for Sparse Predictive Analytics , 2017, SIGIR.

[26]  Hao Wang,et al.  Adapting to User Interest Drift for POI Recommendation , 2016, IEEE Transactions on Knowledge and Data Engineering.

[27]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[28]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[29]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[30]  Julian J. McAuley,et al.  Translation-based factorization machines for sequential recommendation , 2018, RecSys.

[31]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[32]  Hongzhi Yin,et al.  Streaming Session-based Recommendation , 2019, KDD.

[33]  Richang Hong,et al.  Point-of-Interest Recommendations: Learning Potential Check-ins from Friends , 2016, KDD.

[34]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Lin Wu,et al.  TADA: Trend Alignment with Dual-Attention Multi-task Recurrent Neural Networks for Sales Prediction , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[37]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[38]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[39]  Yoshua Bengio,et al.  An empirical analysis of dropout in piecewise linear networks , 2013, ICLR.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jun Wang,et al.  Product-Based Neural Networks for User Response Prediction , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[42]  Julian J. McAuley,et al.  Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[43]  Jun Wang,et al.  Deep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction , 2016, ECIR.

[44]  Lars Schmidt-Thieme,et al.  Fast context-aware recommendations with factorization machines , 2011, SIGIR.

[45]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[46]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.