Streaming Robust Submodular Maximization: A Partitioned Thresholding Approach

We study the classical problem of maximizing a monotone submodular function subject to a cardinality constraint k, with two additional twists: (i) elements arrive in a streaming fashion, and (ii) m items from the algorithm's memory are removed after the stream is finished. We develop a robust submodular algorithm STAR-T. It is based on a novel partitioning structure and an exponentially decreasing thresholding rule. STAR-T makes one pass over the data and retains a short but robust summary. We show that after the removal of any m elements from the obtained summary, a simple greedy algorithm STAR-T-GREEDY that runs on the remaining elements achieves a constant-factor approximation guarantee. In two different data summarization tasks, we demonstrate that it matches or outperforms existing greedy and streaming methods, even if they are allowed the benefit of knowing the removed subset in advance.

[1]  Carlos Guestrin,et al.  Beyond keyword search: discovering relevant scientific literature , 2011, KDD.

[2]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[3]  Andreas Krause,et al.  Budgeted Nonparametric Learning from Data Streams , 2010, ICML.

[4]  James B. Orlin,et al.  Robust monotone submodular function maximization , 2015, Mathematical Programming.

[5]  Volkan Cevher,et al.  An Efficient Streaming Algorithm for the Submodular Cover Problem , 2016, NIPS.

[6]  Jeff A. Bilmes,et al.  Interactive Submodular Set Cover , 2010, ICML.

[7]  Sergei Vassilvitskii,et al.  Fast greedy algorithms in mapreduce and streaming , 2013, SPAA.

[8]  Stefanie Jegelka,et al.  Robust Budget Allocation Via Continuous Submodular Functions , 2017, Applied Mathematics & Optimization.

[9]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[10]  Volkan Cevher,et al.  Robust Submodular Maximization: A Non-Uniform Partitioning Approach , 2017, ICML.

[11]  Andreas Krause,et al.  Deletion-Robust Submodular Maximization: Data Summarization with "the Right to be Forgotten" , 2017, ICML.

[12]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[13]  Wei Chen,et al.  Robust Influence Maximization , 2016, KDD.

[14]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[15]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[16]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[17]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[18]  Jure Leskovec,et al.  Discovering social circles in ego networks , 2012, ACM Trans. Knowl. Discov. Data.

[19]  H. B. McMahan,et al.  Robust Submodular Observation Selection , 2008 .

[20]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[21]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[22]  Alexandros G. Dimakis,et al.  Leveraging Sparsity for Efficient Submodular Data Summarization , 2017, NIPS.