Streaming Algorithms for News and Scientific Literature Recommendation: Submodular Maximization with a d-Knapsack Constraint

Submodular maximization problems belong to the family of combinatorial optimization problems and enjoy wide applications. In this paper, we focus on the problem of maximizing a monotone submodular function subject to a $d$-knapsack constraint, for which we propose a streaming algorithm that achieves a $\left(\frac{1}{1+2d}-\epsilon\right)$-approximation of the optimal value, while it only needs one single pass through the dataset without storing all the data in the memory. In our experiments, we extensively evaluate the effectiveness of our proposed algorithm via two applications: news recommendation and scientific literature recommendation. It is observed that the proposed streaming algorithm achieves both execution speedup and memory saving by several orders of magnitude, compared with existing approaches.

[1]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[2]  Ravi Iyer,et al.  Adaptive Keyframe Selection for Video Summarization , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[3]  藤重 悟 Submodular functions and optimization , 1991 .

[4]  Christos Faloutsos,et al.  Dynamics of large networks , 2008 .

[5]  KumarRavi,et al.  Fast Greedy Algorithms in MapReduce and Streaming , 2015 .

[6]  Jan Vondrák,et al.  Fast algorithms for maximizing submodular functions , 2014, SODA.

[7]  Andreas Krause,et al.  Distributed Submodular Maximization: Identifying Representative Elements in Massive Data , 2013, NIPS.

[8]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[9]  Hui Lin,et al.  How to select a good training-data subset for transcription: submodular active selection for sequences , 2009, INTERSPEECH.

[10]  Jeff A. Bilmes,et al.  Submodular feature selection for high-dimensional acoustic score spaces , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Thorsten Joachims,et al.  Online learning to diversify from implicit feedback , 2012, KDD.

[12]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[13]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[14]  Georgios Papachristoudis,et al.  Theoretical guarantees and complexity reduction in information planning , 2015 .

[15]  Marco Gori,et al.  Recommender Systems : A Random-Walk Based Approach , 2006 .

[16]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[17]  Silvio Lattanzi,et al.  Connected Components in MapReduce and Beyond , 2014, SoCC.

[18]  Maxim Sviridenko,et al.  A note on maximizing a submodular set function subject to a knapsack constraint , 2004, Oper. Res. Lett..

[19]  Dragomir R. Radev,et al.  Citation Analysis, Centrality, and the ACL Anthology , 2008 .

[20]  Balaji Padmanabhan,et al.  SCENE: a scalable two-stage personalized news recommendation system , 2011, SIGIR.

[21]  Andreas Krause,et al.  Lazier Than Lazy Greedy , 2014, AAAI.

[22]  Yi Zhang,et al.  Interaction and Personalization of Criteria in Recommender Systems , 2010, UMAP.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Hadas Shachnai,et al.  Maximizing submodular set functions subject to multiple linear constraints , 2009, SODA.

[25]  Sean M. McNee,et al.  On the recommending of citations for research papers , 2002, CSCW '02.

[26]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[27]  Sergei Vassilvitskii,et al.  Fast greedy algorithms in mapreduce and streaming , 2013, SPAA.