Adaptive Greedy Dictionary Selection for Web Media Summarization

Initializing an effective dictionary is an indispensable step for sparse representation. In this paper, we focus on the dictionary selection problem with the objective to select a compact subset of basis from original training data instead of learning a new dictionary matrix as dictionary learning models do. We first design a new dictionary selection model via l2,0 norm. For model optimization, we propose two methods: one is the standard forward-backward greedy algorithm, which is not suitable for large-scale problems; the other is based on the gradient cues at each forward iteration and speeds up the process dramatically. In comparison with the state-of-the-art dictionary selection models, our model is not only more effective and efficient, but also can control the sparsity. To evaluate the performance of our new model, we select two practical web media summarization problems: 1) we build a new data set consisting of around 500 users, 3000 albums, and 1 million images, and achieve effective assisted albuming based on our model and 2) by formulating the video summarization problem as a dictionary selection issue, we employ our model to extract keyframes from a video sequence in a more flexible way. Generally, our model outperforms the state-of-the-art methods in both these two tasks.

[1]  Philip Schniter,et al.  Expectation-Maximization Gaussian-Mixture Approximate Message Passing , 2012, IEEE Transactions on Signal Processing.

[2]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[3]  S. Jensen,et al.  The Cyclic Matching Pursuit and its Application to Audio Modeling and Coding , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[4]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[5]  Jean-Luc Starck,et al.  Sparse Solution of Underdetermined Systems of Linear Equations by Stagewise Orthogonal Matching Pursuit , 2012, IEEE Transactions on Information Theory.

[6]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[8]  Michael Elad,et al.  Analysis K-SVD: A Dictionary-Learning Algorithm for the Analysis Sparse Model , 2013, IEEE Transactions on Signal Processing.

[9]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[10]  Sundeep Rangan,et al.  Generalized approximate message passing for estimation with random linear mixing , 2010, 2011 IEEE International Symposium on Information Theory Proceedings.

[11]  Benjamin B. Bederson,et al.  Semi-automatic photo annotation strategies using event based clustering and clothing based person recognition , 2007, Interact. Comput..

[12]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[13]  John R. Kender,et al.  Video summaries and cross-referencing through mosaic-based representation , 2004, Comput. Vis. Image Underst..

[14]  Andreas Krause,et al.  Submodular Dictionary Selection for Sparse Representation , 2010, ICML.

[15]  Jieping Ye,et al.  Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint , 2013, ICML.

[16]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[17]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[18]  Olgica Milenkovic,et al.  Subspace Pursuit for Compressive Sensing Signal Reconstruction , 2008, IEEE Transactions on Information Theory.

[19]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Dong Liu,et al.  Semi-Automatic Tagging of Photo Albums via Exemplar Selection and Tag Inference , 2011, IEEE Transactions on Multimedia.

[21]  Jiebo Luo,et al.  Towards Extracting Semantically Meaningful Key Frames From Personal Video Clips: From Humans to Computers , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Guizhong Liu,et al.  A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Shaohui Mei,et al.  L2,0 constrained sparse dictionary selection for video summarization , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[25]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[26]  Jiebo Luo,et al.  Speeded Up Low-Rank Online Metric Learning for Object Tracking , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[28]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[29]  Eric P. Xing,et al.  Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations , 2011, IEEE Transactions on Information Theory.

[31]  Mubarak Shah,et al.  Detection and representation of scenes in videos , 2005, IEEE Transactions on Multimedia.

[32]  Yuandong Tian,et al.  EasyAlbum: an interactive photo annotation system based on face clustering and re-ranking , 2007, CHI.

[33]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[34]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[35]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[36]  Kjersti Engan,et al.  Recursive Least Squares Dictionary Learning Algorithm , 2010, IEEE Transactions on Signal Processing.

[37]  Junsong Yuan,et al.  Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[38]  Yuxin Peng,et al.  Clip-based similarity measure for query-dependent clip retrieval and video summarization , 2006, IEEE Trans. Circuits Syst. Video Technol..

[39]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[40]  Jieping Ye,et al.  A Multi-Stage Framework for Dantzig Selector and LASSO , 2012, J. Mach. Learn. Res..

[41]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[42]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[43]  Shuai Wang,et al.  Scalable gastroscopic video summarization via similar-inhibition dictionary selection , 2016, Artif. Intell. Medicine.

[44]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[45]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[47]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[48]  Jiebo Luo,et al.  Self-Supervised Online Metric Learning With Low Rank Constraint for Scene Categorization , 2013, IEEE Transactions on Image Processing.

[49]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[52]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[53]  Aggelos K. Katsaggelos,et al.  Rate-distortion optimal video summary generation , 2005, IEEE Transactions on Image Processing.

[54]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[55]  Andreas Krause,et al.  Greedy Dictionary Selection for Sparse Representation , 2011, IEEE Journal of Selected Topics in Signal Processing.