Image collection summarization via dictionary learning for sparse representation

In this paper, a novel approach is developed to achieve automatic image collection summarization. The effectiveness of the summary is reflected by its ability to reconstruct the original set or each individual image in the set. We have leveraged the dictionary learning for sparse representation model to construct the summary and to represent the image. Specifically we reformulate the summarization problem into a dictionary learning problem by selecting bases which can be sparsely combined to represent the original image and achieve a minimum global reconstruction error, such as MSE (Mean Square Error). The resulting ''Sparse Least Square'' problem is NP-hard, thus a simulated annealing algorithm is adopted to learn such dictionary, or image summary, by minimizing the proposed optimization function. A quantitative measurement is defined for assessing the quality of the image summary by investigating both its reconstruction ability and its representativeness of the original image set in large size. We have also compared the performance of our image summarization approach with that of six other baseline summarization tools on multiple image sets (ImageNet, NUS-WIDE-SCENE and Event image set). Our experimental results have shown that the proposed dictionary learning approach can obtain more accurate results as compared with other six baseline summarization algorithms.

[1]  Xian-Sheng Hua,et al.  Interactive browsing via diversified visual summarization for image search results , 2011, Multimedia Systems.

[2]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[3]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[4]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[5]  Kjersti Engan,et al.  Frame based signal compression using method of optimal directions (MOD) , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[6]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[8]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[9]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[11]  Jeremiah D. Deng Content-based image collection summarization and comparison using self-organizing maps , 2007, Pattern Recognit..

[12]  Jianping Fan,et al.  JustClick: Personalized Image Recommendation via Exploratory Search From Large-Scale Flickr Images , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[15]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[16]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Youssef Hadi,et al.  Video summarization by k-medoid clustering , 2006, SAC '06.

[18]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[19]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[20]  Pinaki Sinha Summarization of archived and shared personal photo collections , 2011, WWW.

[21]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[22]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Rama Chellappa,et al.  Video Précis: Highlighting Diverse Aspects of Videos , 2010, IEEE Transactions on Multimedia.

[24]  Andreas Krause,et al.  Submodular Dictionary Selection for Sparse Representation , 2010, ICML.

[25]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[26]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[27]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Giovanni Maria Farinella,et al.  Exploiting Textons Distributions on Spatial Hierarchy for Scene Classification , 2010, EURASIP J. Image Video Process..

[29]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[31]  Sven J. Dickinson,et al.  Selecting canonical views for view-based 3-D object recognition , 2004, ICPR 2004.

[32]  Shumeet Baluja,et al.  Canonical image selection from the web , 2007, CIVR '07.

[33]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Mor Naaman,et al.  Generating summaries for large collections of geo-referenced photographs , 2006, WWW '06.

[35]  Jianping Fan,et al.  Effective summarization of large-scale web images , 2011, MM '11.

[36]  Guillermo Sapiro,et al.  Sparse Modeling of Human Actions from Motion Imagery , 2012, International Journal of Computer Vision.