Image Collection Summarization: Past, Present and Future

With recent trends in data, it is very evident that more and more of it will be continued to be generated. It is suspected that our limit to provide services to customers will be limited by the type of analysis and knowledge that we can extract from the data. Images constitute a fair share of information in the large form of media that is used for communication. For example text, video, audio to name other few along with their meaningful combinations. While Summarization of videos and events have been of recent interest to computer vision and multimedia research community. Recent advances in the field of optimization especially deep learning have shown significant improvements in video summarization. Image Collection Summarization is an important task that continues to elude because of the inherent challenges and its differences from video summarization. Since the video has a lot of temporal link between the frames that can be exploited using some temporal neural networks like Long Short-Term Memory (LSTM) or Recurrent Neural Networks (RNNs) they prove to be useful in case of designing deep learning based architecture for the event and video summarization. Similarly, for text, it can be acknowledged that a long passage and sentences will have a high-level temporal sequence between them which can be exploited for summarization. While in case of a collection of images there is no temporal sequence between two images to be exploited by the network [14, 24]. This has resulted in the problem being esoteric in nature. To remedy this, the following article plans to bring the challenges in the field of image collection summarization, the need for gold standards in the definition of summarization, datasets and quantitative evaluation metrics based on those datasets and also major papers in the area that have aimed to solve the problem in past.

[1]  John R. Kender,et al.  Video Summaries through Mosaic-Based Shot and Scene Clustering , 2002, ECCV.

[2]  Ke Zhang,et al.  Summary Transfer: Exemplar-Based Subset Selection for Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sven J. Dickinson,et al.  Selecting canonical views for view-based 3-D object recognition , 2004, ICPR 2004.

[4]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[5]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jianping Fan,et al.  Image collection summarization via dictionary learning for sparse representation , 2013, Pattern Recognit..

[8]  William T. Freeman,et al.  The generic viewpoint assumption in a framework for visual perception , 1994, Nature.

[9]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[10]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[11]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[12]  Bernard Mérialdo,et al.  Multi-video summarization based on Video-MMR , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[13]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Svetlana Lazebnik,et al.  Enhancing Video Summarization via Vision-Language Embedding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[18]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[19]  Peter M. Hall,et al.  Simple Canonical Views , 2005, BMVC.

[20]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[21]  Ramesh C. Jain,et al.  Summarization of personal photologs using multidimensional content and context , 2011, ICMR '11.