Bayesian feature interaction selection for factorization machines

Abstract Factorization machines are a generic supervised method for a wide range of tasks in the field of artificial intelligence, such as prediction, inference, etc., which can effectively model feature interactions. However, handling combinations of features is expensive due to the exponential growth of feature interactions with the order. In nature, not all feature interactions are equally useful for prediction. Recently, a large number of methods that perform feature interaction selection have attracted great attention because of their effectiveness at filtering out useless feature interactions. Current feature interaction selection methods suffered from the following limitations: (1) they assume that all users share the same feature interactions; and (2) they select pairwise feature interactions only. In this paper, we propose novel Bayesian variable selection methods, targeting feature interaction selection for factorization machines, which effectively reduce the number of interactions. We study personalized feature interaction selection to account for individual preferences, and further extend the model to investigate higher-order feature interaction selection on higher-order factorization machines. We provide empirical evidence for the advantages of the proposed Bayesian feature interaction selection methods using different prediction tasks.

[1]  Maarten de Rijke,et al.  Top-N Recommendation with High-Dimensional Side Information via Locality Preserving Projection , 2017, SIGIR.

[2]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[3]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[4]  Bamshad Mobasher,et al.  Recommendation with Differential Context Weighting , 2013, UMAP.

[5]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[6]  Alexandros Karatzoglou,et al.  Gaussian process factorization machines for context-aware recommendations , 2014, SIGIR.

[7]  Xing Xie,et al.  xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems , 2018, KDD.

[8]  Matthew D. Hoffman,et al.  Variational Autoencoders for Collaborative Filtering , 2018, WWW.

[9]  Tat-Seng Chua,et al.  TEM: Tree-enhanced Embedding Model for Explainable Recommendation , 2018, WWW.

[10]  Martin Ester,et al.  Collaborative Denoising Auto-Encoders for Top-N Recommender Systems , 2016, WSDM.

[11]  Lars Schmidt-Thieme,et al.  Pairwise interaction tensor factorization for personalized tag recommendation , 2010, WSDM '10.

[12]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[13]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[14]  Maciej Kula,et al.  Metadata Embeddings for User and Item Cold-start Recommendations , 2015, CBRecSys@RecSys.

[15]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[16]  Yong Yu,et al.  Collaborative personalized tweet recommendation , 2012, SIGIR '12.

[17]  Wenhu Chen,et al.  Variational Knowledge Graph Reasoning , 2018, NAACL.

[18]  George Karypis,et al.  FISM: factored item similarity models for top-N recommender systems , 2013, KDD.

[19]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[20]  Liron Levin,et al.  OFF-set: one-pass factorization of feature sets for online recommendation in persistent cold start settings , 2013, RecSys.

[21]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[22]  Jianhui Chen,et al.  Convex Factorization Machine for Toxicogenomics Prediction , 2017, KDD.

[23]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[24]  Yang Wang,et al.  Efficient Mining of Frequent Patterns on Uncertain Graphs , 2019, IEEE Transactions on Knowledge and Data Engineering.

[25]  Enhong Chen,et al.  Sparse Factorization Machines for Click-through Rate Prediction , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[26]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[27]  Yi Zhang,et al.  Deep Embedding Forest: Forest-based Serving with Deep Embedding Features , 2017, KDD.

[28]  Dong Yu,et al.  Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features , 2016, KDD.

[29]  M. de Rijke,et al.  Bayesian Personalized Feature Interaction Selection for Factorization Machines , 2019, SIGIR.

[30]  Issei Sato,et al.  Reparameterization trick for discrete variables , 2016, ArXiv.

[31]  Xiao Lin,et al.  Online Compact Convexified Factorization Machine , 2018, WWW.

[32]  Qian Zhao,et al.  GB-CENT: Gradient Boosted Categorical Embedding and Numerical Trees , 2017, WWW.

[33]  Naonori Ueda,et al.  Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms , 2016, ICML.

[34]  Yong Yu,et al.  Product-Based Neural Networks for User Response Prediction over Multi-Field Categorical Data , 2018, ACM Trans. Inf. Syst..

[35]  Gang Fu,et al.  Deep & Cross Network for Ad Click Predictions , 2017, ADKDD@KDD.

[36]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[37]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[38]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[39]  Bin Liu,et al.  AutoHash: Learning Higher-Order Feature Interactions for Deep CTR Prediction , 2022, IEEE Transactions on Knowledge and Data Engineering.

[40]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[41]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[42]  Steffen Rendle,et al.  Learning recommender systems with adaptive regularization , 2012, WSDM '12.

[43]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[44]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[45]  Tat-Seng Chua,et al.  HoAFM: A High-order Attentive Factorization Machine for CTR Prediction , 2020, Inf. Process. Manag..

[46]  Philip S. Yu,et al.  Multilinear Factorization Machines for Multi-Task Multi-View Learning , 2017, WSDM.

[47]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[48]  Naonori Ueda,et al.  Higher-Order Factorization Machines , 2016, NIPS.

[49]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[50]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[51]  Julian J. McAuley,et al.  Translation-based factorization machines for sequential recommendation , 2018, RecSys.

[52]  Meng Wang,et al.  Visual Classification by ℓ1-Hypergraph Modeling , 2015, IEEE Trans. Knowl. Data Eng..

[53]  Xiang Zhao,et al.  Item Cold-Start Recommendation with Personalized Feature Selection , 2020, J. Comput. Sci. Technol..

[54]  Bin Liu,et al.  AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction , 2020, KDD.

[55]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[56]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[57]  Habshah Midi,et al.  Bayesian variable selection and coefficient estimation in heteroscedastic linear regression model , 2018 .

[58]  Chih-Jen Lin,et al.  Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[59]  Ulrich Paquet,et al.  Xbox movies recommendations: variational bayes matrix factorization with embedded feature selection , 2013, RecSys.

[60]  Jun Wang,et al.  Deep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction , 2016, ECIR.

[61]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[62]  Jian Tang,et al.  AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks , 2018, CIKM.

[63]  Huan Liu,et al.  Unsupervised Personalized Feature Selection , 2018, AAAI.

[64]  Ji Zhu,et al.  Variable Selection With the Strong Heredity Constraint and Its Oracle Property , 2010 .

[65]  Lars Schmidt-Thieme,et al.  Fast context-aware recommendations with factorization machines , 2011, SIGIR.

[66]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[67]  Tat-Seng Chua,et al.  Neural Factorization Machines for Sparse Predictive Analytics , 2017, SIGIR.

[68]  Viswanathan Swaminathan,et al.  Feature Selection for FM-Based Context-Aware Recommendation Systems , 2017, 2017 IEEE International Symposium on Multimedia (ISM).

[69]  Tong Zhang,et al.  Gradient boosting factorization machines , 2014, RecSys '14.

[70]  George Karypis,et al.  Evaluation of Item-Based Top-N Recommendation Algorithms , 2001, CIKM '01.

[71]  Tat-Seng Chua,et al.  Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks , 2017, IJCAI.

[72]  Petros Dellaportas,et al.  On Bayesian model and variable selection using MCMC , 2002, Stat. Comput..

[73]  Jiayu Zhou,et al.  Synergies that Matter: Efficient Interaction Selection via Sparse Factorization Machine , 2016, SDM.

[74]  Olivier Chapelle,et al.  Field-aware Factorization Machines in a Real-world Online Advertising System , 2017, WWW.

[75]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[76]  Francis R. Bach,et al.  Sparse probabilistic projections , 2008, NIPS.

[77]  Zhaochun Ren,et al.  Neural Attentive Session-based Recommendation , 2017, CIKM.

[78]  Jian-Yun Nie,et al.  An Attentive Interaction Network for Context-aware Recommendations , 2018, CIKM.

[79]  Weinan Zhang,et al.  BoostFM: Boosted Factorization Machines for Top-N Feature-based Recommendation , 2017, IUI.

[80]  Julian Knoll Higher-order factorization machines: implementation, application, and comparison of a state-of-the-art recommender approach , 2017 .

[81]  George Karypis,et al.  SLIM: Sparse Linear Methods for Top-N Recommender Systems , 2011, 2011 IEEE 11th International Conference on Data Mining.

[82]  Simon J. Godsill,et al.  Sparse linear regression in unions of bases via Bayesian variable selection , 2006, IEEE Signal Processing Letters.

[83]  Royi Ronen,et al.  Selecting content-based features for collaborative filtering recommenders , 2013, RecSys.