Self-Supervision based Task-Specific Image Collection Summarization

Successful applications of deep learning (DL) require a large amount of annotated data. This often restricts the benefits of employing DL to businesses and individuals with large budgets for data-collection and computation. Summarization offers a possible solution by creating much smaller representative datasets that can allow real-time deep learning and analysis of big data and thus democratize the use of DL. In the proposed work, our aim is to explore a novel approach to task-specific image corpus summarization using semantic information and self-supervision. Our method uses a classification-based Wasserstein generative adversarial network (CLSWGAN) as a feature generating network. The model also leverages rotational invariance as selfsupervision and classification on another task. All these objectives are added on features from resnet34 to make it discriminative and robust. The model then generates a summary at inference time by using K-means clustering in the semantic embedding space. Thus, another main advantage of this model is that it does not need to be retrained each time to obtain summaries of different lengths which is an issue with current end-to-end models. We also test our model efficacy by means of rigorous experiments both qualitatively and quantitatively. A. Singh Department of Computer Science Technical University of Munich, Munich E-mail: anuragsingh2@iisc.ac.in *D.K. Sharma (Corresponding Author) Department of Information Technology, Netaji Subhas University of Technology, New Delhi, India E-mail: dk.sharma1982@yahoo.com Department of Computer Science Institute of Information Technology and Management, New Delhi E-mail: sharmashudhir08@gmail.com

[1]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[3]  Deepak Kumar Sharma,et al.  Image Collection Summarization: Past, Present and Future , 2020 .

[4]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Soma Biswas,et al.  Adaptive Margin Diversity Regularizer for Handling Data Imbalance in Zero-Shot SBIR , 2020, ECCV.

[6]  Jeremiah D. Deng Content-based image collection summarization and comparison using self-organizing maps , 2007, Pattern Recognit..

[7]  Junsong Yuan,et al.  Representative Selection with Structured Sparsity , 2017, Pattern Recognit..

[8]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ishwar K. Sethi,et al.  eID: a system for exploration of image databases , 2003, Inf. Process. Manag..

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[14]  Anurag Singh,et al.  Image Corpus Representative Summarization , 2019, 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM).

[15]  Ramesh C. Jain,et al.  Effective summarization of large collections of personal photos , 2011, WWW.

[16]  Henning Müller,et al.  Div150Multi: a social image retrieval result diversification dataset with multi-topic queries , 2016, MMSys.

[17]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[18]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19]  R. L. Thorndike Who belongs in the family? , 1953 .

[20]  Soma Biswas,et al.  StyleGuide: Zero-Shot Sketch-Based Image Retrieval Using Style-Guided Image Generation , 2021, IEEE Transactions on Multimedia.

[21]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[22]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[23]  Kaiqi Huang,et al.  A Multi-Task Deep Network for Person Re-Identification , 2016, AAAI.

[24]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[25]  Deepak Kumar Sharma,et al.  Evaluation of parameters and techniques for genetic algorithm based channel allocation in Cognitive Radio Networks , 2017, 2017 Tenth International Conference on Contemporary Computing (IC3).

[26]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[27]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[28]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[29]  Jianping Fan,et al.  Image collection summarization via dictionary learning for sparse representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Y. Fisher Fractal image compression: theory and application , 1995 .

[31]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[32]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[33]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  David J. Ketchen,et al.  THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE , 1996 .

[35]  John C. Dalton,et al.  Hierarchical browsing and search of large image databases , 2000, IEEE Trans. Image Process..

[36]  Fabio A. González,et al.  A Multi-class Kernel Alignment Method for Image Collection Summarization , 2009, CIARP.

[37]  C. Gini Variabilita e Mutabilita. , 1913 .

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).