Deletion-Robust Submodular Maximization: Data Summarization with "the Right to be Forgotten"

How can we summarize a dynamic data stream when elements selected for the summary can be deleted at any time? This is an important challenge in online services, where the users generating the data may decide to exercise their right to restrict the service provider from using (part of) their data due to privacy concerns. Motivated by this challenge, we introduce the dynamic deletion-robust submodular maximization problem. We develop the first resilient streaming algorithm, called ROBUST-STREAMING, with a constant factor approximation guarantee to the optimum solution. We evaluate the effectiveness of our approach on several real-world applications, including summarizing (1) streams of geo-coordinates (2); streams of images; and (3) click-stream log data, consisting of 45 million feature vectors from a news recommendation task.

[1]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Andreas Krause,et al.  Streaming Non-monotone Submodular Maximization: Personalized Video Summarization on the Fly , 2017, AAAI.

[3]  Kent Quanrud,et al.  Streaming Algorithms for Submodular Function Maximization , 2015, ICALP.

[4]  James B. Orlin,et al.  Robust monotone submodular function maximization , 2018, Math. Program..

[5]  Thorsten Joachims,et al.  Temporal corpus summarization using submodular word coverage , 2012, CIKM '12.

[6]  Andreas Krause,et al.  Budgeted Nonparametric Learning from Data Streams , 2010, ICML.

[7]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[8]  Amit Chakrabarti,et al.  Submodular maximization meets streaming: matchings, matroids, and more , 2013, Math. Program..

[9]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[10]  Silvio Lattanzi,et al.  Submodular Optimization Over Sliding Windows , 2016, WWW.

[11]  Andreas Krause,et al.  Lazier Than Lazy Greedy , 2014, AAAI.

[12]  Joseph Naor,et al.  Submodular Maximization with Cardinality Constraints , 2014, SODA.

[13]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[14]  Rolf H. Weber The right to be forgotten: more than a pandora's box? , 2011 .

[15]  Jan Vondrák,et al.  Fast algorithms for maximizing submodular functions , 2014, SODA.

[16]  Rishabh K. Iyer,et al.  Submodularity in Data Subset Selection and Active Learning , 2015, ICML.

[17]  H. B. McMahan,et al.  Robust Submodular Observation Selection , 2008 .

[18]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[19]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[20]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[21]  David Burnham Data protection , 1990 .

[22]  Baharan Mirzasoleiman,et al.  Revenue maximization in social networks through discounting , 2012, Social Network Analysis and Mining.

[23]  Carlos Guestrin,et al.  Beyond keyword search: discovering relevant scientific literature , 2011, KDD.

[24]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.