Joint Representative Selection and Feature Learning: A Semi-Supervised Approach

In this paper, we propose a semi-supervised approach for representative selection, which finds a small set of representatives that can well summarize a large data collection. Given labeled source data and big unlabeled target data, we aim to find representatives in the target data, which can not only represent and associate data points belonging to each labeled category, but also discover novel categories in the target data, if any. To leverage labeled source data, we guide representative selection from labeled source to unlabeled target. We propose a joint optimization framework which alternately optimizes (1) representative selection in the target data and (2) discriminative feature learning from both the source and the target for better representative selection. Experiments on image and video datasets demonstrate that our proposed approach not only finds better representatives, but also can discover novel categories in the target data that are not in the source.

[1]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[2]  Ivan Laptev,et al.  Unsupervised Learning from Narrated Instruction Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Éva Tardos,et al.  Approximation algorithms for facility location problems (extended abstract) , 1997, STOC '97.

[4]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[5]  Kamesh Munagala,et al.  Local search heuristic for k-median and facility location problems , 2001, STOC '01.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  S. Shankar Sastry,et al.  Dissimilarity-Based Sparse Subset Selection , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Guillermo Sapiro,et al.  Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery , 2012, NIPS.

[9]  Jianping Fan,et al.  Image collection summarization via dictionary learning for sparse representation , 2013, Pattern Recognit..

[10]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[11]  Alex Kulesza,et al.  Markov Determinantal Point Processes , 2012, UAI.

[12]  Mubarak Shah,et al.  Query-Focused Extractive Video Summarization , 2016, ECCV.

[13]  Ehsan Elhamifar,et al.  Subset Selection and Summarization in Sequential Data , 2017, NIPS.

[14]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Rajmohan Rajaraman,et al.  Analysis of a local search heuristic for facility location problems , 2000, SODA '98.

[16]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Shi Li,et al.  Approximating k-median via pseudo-approximation , 2012, STOC '13.

[18]  Ben Taskar,et al.  Structured Determinantal Point Processes , 2010, NIPS.

[19]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[20]  Ben Taskar,et al.  k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[21]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[22]  Ehsan Elhamifar,et al.  Online Summarization via Submodular and Convex Optimization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[25]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[26]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[27]  Thomas Serre,et al.  The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Junsong Yuan,et al.  Video Summarization Via Multiview Representative Selection , 2018, IEEE Transactions on Image Processing.

[29]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[30]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[31]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[32]  Brendan J. Frey,et al.  Mixture Modeling by Affinity Propagation , 2005, NIPS.

[33]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Stefanie Jegelka,et al.  Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets , 2014, NIPS.

[35]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[36]  Junsong Yuan,et al.  From Keyframes to Key Objects: Video Summarization by Representative Object Proposal Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[39]  Jiawei Zhang,et al.  Approximation algorithms for facility location problems , 2004 .

[40]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Amit K. Roy-Chowdhury,et al.  Weakly Supervised Summarization of Web Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).