Dissimilarity-Based Sparse Subset Selection

Finding an informative subset of a large collection of data points or models is at the center of many problems in computer vision, recommender systems, bio/health informatics as well as image and natural language processing. Given pairwise dissimilarities between the elements of a `source set' and a `target set,' we consider the problem of finding a subset of the source set, called representatives or exemplars, that can efficiently describe the target set. We formulate the problem as a row-sparsity regularized trace minimization problem. Since the proposed formulation is, in general, NP-hard, we consider a convex relaxation. The solution of our optimization finds representatives and the assignment of each element of the target set to each representative, hence, obtaining a clustering. We analyze the solution of our proposed optimization as a function of the regularization parameter. We show that when the two sets jointly partition into multiple groups, our algorithm finds representatives from all groups and reveals clustering of the sets. In addition, we show that the proposed framework can effectively deal with outliers. Our algorithm works with arbitrary dissimilarities, which can be asymmetric or violate the triangle inequality. To efficiently implement our algorithm, we consider an Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. We show that the ADMM implementation allows to parallelize the algorithm, hence further reducing the computational time. Finally, by experiments on real-world datasets, we show that our proposed algorithm improves the state of the art on the two problems of scene categorization using representative images and time-series modeling and segmentation using representative models.

[1]  Brendan J. Frey,et al.  Mixture Modeling by Affinity Propagation , 2005, NIPS.

[2]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Bart De Moor,et al.  Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  René Vidal,et al.  Identification of Hybrid Systems: A Tutorial , 2007, Eur. J. Control.

[6]  S. Sastry,et al.  An algebraic geometric approach to the identification of a class of linear hybrid systems , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[7]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[8]  Hui Lin,et al.  How to select a good training-data subset for transcription: submodular active selection for sequences , 2009, INTERSPEECH.

[9]  Ben Taskar,et al.  Nystrom Approximation for Large-Scale Determinantal Processes , 2013, AISTATS.

[10]  Valérie R. Wajs,et al.  A variational formulation for frame-based inverse problems , 2007 .

[11]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[12]  R. Tibshirani,et al.  Prototype selection for interpretable classification , 2011, 1202.5933.

[13]  Nicolas Privault,et al.  Determinantal Point Processes , 2016 .

[14]  J. Tropp Algorithms for simultaneous sparse approximation. Part II: Convex relaxation , 2006, Signal Process..

[15]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[16]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[17]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[18]  Kristen Grauman,et al.  Active Frame Selection for Label Propagation in Videos , 2012, ECCV.

[19]  David B. Shmoys,et al.  Approximation algorithms for facility location problems , 2000, APPROX.

[20]  Jessica K. Hodgins,et al.  Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Samuel A. Burden,et al.  Adaptive Piecewise–Affine Inverse Modeling of Hybrid Dynamical Systems , 2014 .

[22]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[23]  Hui Lin,et al.  Graph-based submodular selection for extractive summarization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[24]  Allen Y. Yang,et al.  A Convex Optimization Framework for Active Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  T. Chan Rank revealing QR factorizations , 1987 .

[27]  René Vidal,et al.  Group action induced distances for averaging and clustering Linear Dynamical Systems with applications to the analysis of dynamic scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Michael Möller,et al.  A Convex Model for Nonnegative Matrix Factorization and Dimensionality Reduction on Physical Space , 2011, IEEE Transactions on Image Processing.

[29]  Polina Golland,et al.  Convex Clustering with Exemplar-Based Models , 2007, NIPS.

[30]  Stephen Tyree,et al.  Stochastic Neighbor Compression , 2014, ICML.

[31]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[32]  Shi Li,et al.  Approximating k-median via pseudo-approximation , 2012, STOC '13.

[33]  O. Macchi The coincidence approach to stochastic point processes , 1975, Advances in Applied Probability.

[34]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[35]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[36]  W. Bajwa,et al.  Column Subset Selection with Missing Data , 2010 .

[37]  GuMing,et al.  Efficient algorithms for computing a strong rank-revealing QR factorization , 1996 .

[38]  Manfred Morari,et al.  A clustering technique for the identification of piecewise affine systems , 2001, Autom..

[39]  Brendan J. Frey,et al.  Hierarchical Affinity Propagation , 2011, UAI.

[40]  Guillermo Sapiro,et al.  Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery , 2012, NIPS.

[41]  Ben Taskar,et al.  k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[42]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[43]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[44]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[45]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[46]  Rachel Ward,et al.  Recovery guarantees for exemplar-based clustering , 2013, Inf. Comput..

[47]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[48]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[49]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[50]  Jernej Barbic,et al.  Segmenting Motion Capture Data into Distinct Behaviors , 2004, Graphics Interface.

[51]  Vahab S. Mirrokni,et al.  Optimal marketing strategies over social networks , 2008, WWW.

[52]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Jiawei Zhang,et al.  Approximation algorithms for facility location problems , 2004 .

[54]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55]  George O. Wesolowsky,et al.  THE WEBER PROBLEM: HISTORY AND PERSPECTIVES. , 1993 .

[56]  Ravishankar Krishnaswamy,et al.  Relax, No Need to Round: Integrality of Clustering Formulations , 2014, ITCS.

[57]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[58]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[59]  Joel A. Tropp,et al.  Column subset selection, matrix factorization, and eigenvalue optimization , 2008, SODA.

[60]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[61]  Zhe L. Lin,et al.  Exemplar-Based Face Parsing , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[63]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[64]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[65]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Michael W. Mahoney,et al.  CUR from a Sparse Optimization Viewpoint , 2010, NIPS.

[67]  Shi Li,et al.  A 1.488 approximation algorithm for the uncapacitated facility location problem , 2011, Inf. Comput..

[68]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[69]  H. B. McMahan,et al.  Robust Submodular Observation Selection , 2008 .

[70]  Martial Hebert,et al.  Data-driven exemplar model selection , 2014, IEEE Winter Conference on Applications of Computer Vision.

[71]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..