Do Less, Get More: Streaming Submodular Maximization with Subsampling

In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of data stream, our algorithm enjoys the tightest approximation guarantees in various settings while having the smallest memory footprint and requiring the lowest number of function evaluations. More specifically, for a monotone submodular function and a $p$-matchoid constraint, our randomized algorithm achieves a $4p$ approximation ratio (in expectation) with $O(k)$ memory and $O(km/p)$ queries per element ($k$ is the size of the largest feasible solution and $m$ is the number of matroids used to define the constraint). For the non-monotone case, our approximation ratio increases only slightly to $4p+2-o(1)$. To the best or our knowledge, our algorithm is the first that combines the benefits of streaming and subsampling in a novel way in order to truly scale submodular maximization to massive machine learning problems. To showcase its practicality, we empirically evaluated the performance of our algorithm on a video summarization application and observed that it outperforms the state-of-the-art algorithm by up to fifty fold, while maintaining practically the same utility.

[1]  William Stafford Noble,et al.  Choosing non‐redundant representative subsets of protein sequence data sets using submodular optimization , 2018, BCB.

[2]  Joseph Naor,et al.  A Unified Continuous Greedy Algorithm for Submodular Maximization , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[3]  Jan Vondrák,et al.  Symmetry and Approximability of Submodular Maximization Problems , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[4]  Maurice Queyranne,et al.  An Exact Algorithm for Maximum Entropy Sampling , 1995, Oper. Res..

[5]  Joseph Naor,et al.  Submodular Maximization with Cardinality Constraints , 2014, SODA.

[6]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[7]  Jan Vondrák,et al.  Submodular Maximization over Multiple Matroids via Generalized Exchange Properties , 2009, Math. Oper. Res..

[8]  Silvio Lattanzi,et al.  Submodular Optimization Over Sliding Windows , 2016, WWW.

[9]  Amin Karbasi,et al.  Greed Is Good: Near-Optimal Submodular Maximization via Greedy Optimization , 2017, COLT.

[10]  Justin Ward A (k+3)/2-approximation algorithm for monotone submodular k-set packing and general k-exchange systems , 2012, STACS.

[11]  Joseph Naor,et al.  Improved Approximations for k-Exchange Systems - (Extended Abstract) , 2011, ESA.

[12]  Andreas Krause,et al.  Streaming Non-monotone Submodular Maximization: Personalized Video Summarization on the Fly , 2017, AAAI.

[13]  Baharan Mirzasoleiman,et al.  Fast Constrained Submodular Maximization: Personalized Data Summarization , 2016, ICML.

[14]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[15]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[16]  Amit Chakrabarti,et al.  Submodular maximization meets streaming: matchings, matroids, and more , 2015, Math. Program..

[17]  Andreas Krause,et al.  Budgeted Nonparametric Learning from Data Streams , 2010, ICML.

[18]  Jan Vondrák,et al.  Maximizing a Monotone Submodular Function Subject to a Matroid Constraint , 2011, SIAM J. Comput..

[19]  Amin Karbasi,et al.  A Submodular Approach to Create Individualized Parcellations of the Human Brain , 2017, MICCAI.

[20]  Qin Zhang,et al.  Submodular Maximization over Sliding Windows , 2016, ArXiv.

[21]  Andreas Krause,et al.  Distributed Submodular Maximization , 2014, J. Mach. Learn. Res..

[22]  Roy Schwartz,et al.  Online Submodular Maximization with Preemption , 2015, SODA.

[23]  Vahab S. Mirrokni,et al.  Maximizing Nonmonotone Submodular Functions under Matroid or Knapsack Constraints , 2009, SIAM J. Discret. Math..

[24]  Jan Vondrák,et al.  Submodular maximization by simulated annealing , 2010, SODA '11.

[25]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[26]  Niv Buchbinder,et al.  Constrained Submodular Maximization via a Non-symmetric Technique , 2016, Math. Oper. Res..

[27]  O. Macchi The coincidence approach to stochastic point processes , 1975, Advances in Applied Probability.

[28]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[29]  Jan Vondr Symmetry and Approximability of Submodular Maximization Problems , 2013 .

[30]  Morteza Zadimoghaddam,et al.  Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints , 2018, ICML.

[31]  Kian-Lee Tan,et al.  Efficient Streaming Algorithms for Submodular Maximization with Multi-Knapsack Constraints , 2017, ArXiv.

[32]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[33]  Andreas Krause,et al.  Deletion-Robust Submodular Maximization: Data Summarization with "the Right to be Forgotten" , 2017, ICML.

[34]  T.-H. Hubert Chan,et al.  Online Submodular Maximization with Free Disposal: Randomization Beats ¼ for Partition Matroids , 2017, SODA.

[35]  Laurence A. Wolsey,et al.  Best Algorithms for Approximating the Maximum of a Submodular Set Function , 1978, Math. Oper. Res..

[36]  Morteza Zadimoghaddam,et al.  Data Summarization at Scale: A Two-Stage Submodular Approach , 2018, ICML.

[37]  Ashwinkumar Badanidiyuru,et al.  Buyback Problem - Approximate Matroid Intersection with Cancellation Costs , 2010, ICALP.

[38]  Kent Quanrud,et al.  Streaming Algorithms for Submodular Function Maximization , 2015, ICALP.

[39]  William Stafford Noble,et al.  Choosing non‐redundant representative subsets of protein sequence data sets using submodular optimization , 2018, Proteins.

[40]  Wenruo Bai,et al.  Deep Submodular Functions , 2017, ArXiv.

[41]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[42]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[43]  Huy L. Nguyen,et al.  Constrained Submodular Maximization: Beyond 1/e , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[44]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[45]  Amit Chakrabarti,et al.  Submodular maximization meets streaming: matchings, matroids, and more , 2013, Math. Program..

[46]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[47]  Aaron Roth,et al.  Constrained Non-monotone Submodular Maximization: Offline and Secretary Algorithms , 2010, WINE.