Online bagging for recommender systems

Ensemble methods have been successfully used in the past to improve recommender systems; however, they have never been studied with incremental recommendation algorithms. Many online recommender systems deal with continuous, potentially fast, and unbounded flows of data—big data streams—and often need to be responsive to fresh user feedback, adjusting recommendations accordingly. This is clear in tasks such as social network feeds, news recommender systems, automatic playlist completion, and other similar applications. Batch ensemble approaches are not suitable to perform continuous learning, given the complexity of retraining new models on demand. In this paper, we adapt a general purpose online bagging algorithm for top‐N recommendation tasks and propose two novel online bagging methods specifically tailored for recommender systems. We evaluate the three approaches, using an incremental matrix factorization algorithm for top‐N recommendation with positive‐only user feedback data as the base model. Our results show that online bagging is able to improve accuracy up to 55% over the baseline, with manageable computational overhead.

[1]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[2]  Jean Paul Barddal,et al.  A Survey on Ensemble Learning for Data Stream Classification , 2017, ACM Comput. Surv..

[3]  Martial Hebert,et al.  Gradient Boosting on Stochastic Data Streams , 2017, AISTATS.

[4]  BoostMF: Boosted Matrix Factorisation for Collaborative Ranking , 2015, ECML/PKDD.

[5]  João Gama,et al.  Fast Incremental Matrix Factorization for Recommendation with Positive-Only Feedback , 2014, UMAP.

[6]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[7]  Hsuan-Tien Lin,et al.  An Online Boosting Algorithm with Theoretical Justifications , 2012, ICML.

[8]  Lars Schmidt-Thieme,et al.  MyMediaLite: a free recommender system library , 2011, RecSys '11.

[9]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[10]  Robert A. Legenstein,et al.  Combining predictions for accurate recommender systems , 2010, KDD.

[11]  Joseph Sill,et al.  Feature-Weighted Linear Stacking , 2009, ArXiv.

[12]  Amnon Meisels,et al.  Ensemble methods for improving the performance of neighborhood-based collaborative filtering , 2009, RecSys '09.

[13]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[14]  María N. Moreno,et al.  An experimental comparative study of web mining methods for recommender systems , 2006 .

[15]  Herbert K. H. Lee,et al.  Lossless Online Bayesian Bagging , 2004, J. Mach. Learn. Res..

[16]  João Gama,et al.  Forest trees for on-line data , 2004, SAC '04.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[20]  Geoff Hulten,et al.  Catching up with the Data: Research Issues in Mining Data Streams , 2001, DMKD.

[21]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[22]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.