Submodular Optimization Over Streams with Inhomogeneous Decays

Cardinality constrained submodular function maximization, which aims to select a subset of size at most $k$ to maximize a monotone submodular utility function, is the key in many data mining and machine learning applications such as data summarization and maximum coverage problems. When data is given as a stream, streaming submodular optimization (SSO) techniques are desired. Existing SSO techniques can only apply to insertion-only streams where each element has an infinite lifespan, and sliding-window streams where each element has a same lifespan (i.e., window size). However, elements in some data streams may have arbitrary different lifespans, and this requires addressing SSO over streams with inhomogeneous-decays (SSO-ID). This work formulates the SSO-ID problem and presents three algorithms: BasicStreaming is a basic streaming algorithm that achieves an $(1/2-\epsilon)$ approximation factor; HistApprox improves the efficiency significantly and achieves an $(1/3-\epsilon)$ approximation factor; HistStreaming is a streaming version of HistApprox and uses heuristics to further improve the efficiency. Experiments conducted on real data demonstrate that HistStreaming can find high quality solutions and is up to two orders of magnitude faster than the naive Greedy algorithm.

[1]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[2]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[3]  Graham Cormode,et al.  Set cover algorithms for very large datasets , 2010, CIKM.

[4]  Morteza Zadimoghaddam,et al.  Bicriteria Distributed Submodular Maximization in a Few Rounds , 2017, SPAA.

[5]  Qin Zhang,et al.  Submodular Maximization over Sliding Windows , 2016, ArXiv.

[6]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[7]  Morteza Zadimoghaddam,et al.  Data Summarization at Scale: A Two-Stage Submodular Approach , 2018, ICML.

[8]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[9]  Sergei Vassilvitskii,et al.  Fast greedy algorithms in mapreduce and streaming , 2013, SPAA.

[10]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[11]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[12]  Silvio Lattanzi,et al.  Submodular Optimization Over Sliding Windows , 2016, WWW.

[13]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[14]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[15]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[16]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[17]  Andreas Krause,et al.  Lazier Than Lazy Greedy , 2014, AAAI.

[18]  Rafail Ostrovsky,et al.  Smooth Histograms for Sliding Windows , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[19]  Michel Minoux,et al.  Accelerated greedy algorithms for maximizing submodular set functions , 1978 .

[20]  Sergei Vassilvitskii,et al.  Fast greedy algorithms in mapreduce and streaming , 2013, SPAA.

[21]  Graham Cormode,et al.  Time-decaying Sketches for Robust Aggregation of Sensor Data , 2009, SIAM J. Comput..

[22]  Edith Cohen,et al.  Maintaining time-decaying stream aggregates , 2006, J. Algorithms.

[23]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..