PRISM: A Unified Framework of Parameterized Submodular Information Measures for Targeted Data Subset Selection and Summarization

With increasing data, techniques for finding smaller yet effective subsets with specific characteristics become important. Motivated by this, we present PRISM, a rich class of PaRameterIzed Submodular information Measures that can be used in applications where such targeted subsets are desired. We demonstrate the utility of PRISM in two such applications. First, we apply PRISM to improve a supervised model’s performance at a given additional labeling cost by targeted subset selection (PRISM-TSS) where a subset of unlabeled points matching a target set are added to the training set. We show that PRISM-TSS generalizes and is connected to several existing approaches to targeted data subset selection. Second, we apply PRISM to a more nuanced targeted summarization (PRISM-TSUM) where data (e.g., image collections, text or videos) is summarized for quicker human consumption with additional user intent. PRISM-TSUM handles multiple flavors of targeted summarization such as queryfocused, topic-irrelevant, privacy-preserving and update summarization in a unified way. We show that PRISM-TSUM also generalizes and unifies several existing past work on targeted summarization. Through extensive experiments on image classification and image-collection summarization we empirically verify the superiority of PRISM-TSS and PRISM-TSUM over the state-of-the-art.

[1]  Rishabh K. Iyer,et al.  Polyhedral aspects of Submodularity, Convexity and Concavity , 2015, ArXiv.

[2]  Pratik Dubal,et al.  Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity, Representation, Coverage and Importance , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3]  Xiaojun Wan,et al.  Recent advances in document summarization , 2017, Knowledge and Information Systems.

[4]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[5]  Boqing Gong,et al.  Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[7]  Anupam Gupta,et al.  The Online Submodular Cover Problem , 2020, SODA.

[8]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[9]  Mubarak Shah,et al.  Query-Focused Extractive Video Summarization , 2016, ECCV.

[10]  Convolutional Hierarchical Attention Network for Query-Focused Video Summarization , 2020, AAAI.

[11]  Rishabh K. Iyer,et al.  SVitchboard II and fiSVer i: high-quality limited-complexity corpora of conversational English speech , 2015, INTERSPEECH.

[12]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[13]  Lin Zhao,et al.  Improving Update Summarization via Supervised ILP and Sentence Reranking , 2015, HLT-NAACL.

[14]  Baharan Mirzasoleiman,et al.  Coresets for Data-efficient Training of Machine Learning Models , 2019, ICML.

[15]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Tao Li,et al.  Multi-document summarization via submodularity , 2012, Applied Intelligence.

[17]  Erkut Erdem,et al.  Diverse Neural Photo Album Summarization , 2019, 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA).

[18]  Ganesh Ramakrishnan,et al.  GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning , 2021, AAAI.

[19]  Xuelong Li,et al.  Video Summarization With Attention-Based Encoder–Decoder Networks , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  R. Yeung,et al.  On characterization of entropy function via information inequalities , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[21]  Rishabh Iyer,et al.  Submodular Combinatorial Information Measures with Applications in Machine Learning , 2020, ArXiv.

[22]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[23]  Rishabh K. Iyer,et al.  Summarization of Multi-Document Topic Hierarchies using Submodular Mixtures , 2015, ACL.

[24]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[25]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[26]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Hui Lin,et al.  Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[30]  Anurag Singh,et al.  Image Corpus Representative Summarization , 2019, 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM).

[31]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[32]  藤重 悟 Submodular functions and optimization , 1991 .

[33]  L. Elisa Celis,et al.  Implicit Diversity in Image Summarization , 2019, Proc. ACM Hum. Comput. Interact..

[34]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[35]  Rishabh K. Iyer,et al.  Submodularity in Data Subset Selection and Active Learning , 2015, ICML.

[36]  Suraj Kothawade,et al.  A Framework Towards Domain Specific Video Summarization , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[37]  Luc Van Gool,et al.  Query-adaptive Video Summarization via Quality-aware Relevance Estimation , 2017, ACM Multimedia.

[38]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[39]  Yahong Han,et al.  Hierarchical Variational Network for User-Diversified & Query-Focused Video Summarization , 2019, ICMR.

[40]  Rishabh K. Iyer,et al.  Submodular Optimization and Machine Learning: Theoretical Results, Unifying and Scalable Algorithms, and Applications , 2015 .

[41]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[42]  Enrique Alfonseca,et al.  DualSum: a Topic-Model based approach for update summarization , 2012, EACL.

[43]  Yllias Chali,et al.  Towards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging , 2017, IJCNLP.

[44]  Rohan Mahadev,et al.  Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[46]  J. Bilmes,et al.  Submodularity in natural language processing: algorithms and applications , 2012 .

[47]  Zhen Zhang,et al.  On Characterization of Entropy Function via Information Inequalities , 1998, IEEE Trans. Inf. Theory.

[48]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[49]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..