论文信息 - Iterative Projection and Matching: Finding Structure-Preserving Representatives and Its Application to Computer Vision

Iterative Projection and Matching: Finding Structure-Preserving Representatives and Its Application to Computer Vision

The goal of data selection is to capture the most structural information from a set of data. This paper presents a fast and accurate data selection method, in which the selected samples are optimized to span the subspace of all data. We propose a new selection algorithm, referred to as iterative projection and matching (IPM), with linear complexity w.r.t. the number of data, and without any parameter to be tuned. In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples. The computational efficiency and the selection accuracy of our proposed algorithm outperform those of the conventional methods. Furthermore, the superiority of the proposed algorithm is shown on active learning for video action recognition dataset on UCF-101; learning using representatives on ImageNet; training a generative adversarial network (GAN) to generate multi-view images from a single-view input on CMU Multi-PIE dataset; and video summarization on UTE Egocentric dataset.

[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Cheng Li,et al. Pose-Robust Face Recognition via Deep Residual Equivariant Mapping , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Suvrit Sra,et al. Polynomial time algorithms for dual volume sampling , 2017, NIPS.

[4] Lie Wang,et al. Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise , 2011, IEEE Transactions on Information Theory.

[5] Haris Vikalo,et al. Greedy sensor selection: Leveraging submodularity , 2010, 49th IEEE Conference on Decision and Control (CDC).

[6] Nazanin Rahnavard,et al. E-Optimal Sensor Selection for Compressive Sensing-Based Purposes , 2020, IEEE Transactions on Big Data.

[7] Jed A. Duersch,et al. Randomized QR with Column Pivoting , 2015, SIAM J. Sci. Comput..

[8] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[9] Yong Jae Lee,et al. Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11] Takeo Kanade,et al. Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[12] Kristen Grauman,et al. Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13] S. Shankar Sastry,et al. Dissimilarity-Based Sparse Subset Selection , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[14] Nazanin Rahnavard,et al. Dynamic Sensor Selection for Reliable Spectrum Sensing via E-Optimal Criterion , 2017, 2017 IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS).

[15] Mohamed S. Kamel,et al. Greedy column subset selection for large-scale data sets , 2014, Knowledge and Information Systems.

[16] George Atia,et al. Robust and Scalable Column/Row Sampling from Corrupted Big Data , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[17] Ke Zhang,et al. Video Summarization with Long Short-Term Memory , 2016, ECCV.

[18] Joel A. Tropp,et al. Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[19] Yu Tian,et al. CR-GAN: Learning Complete Representations for Multi-view Generation , 2018, IJCAI.

[20] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[22] Guillermo Sapiro,et al. See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Junsong Yuan,et al. From Keyframes to Key Objects: Video Summarization by Representative Object Proposal Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Zoubin Ghahramani,et al. Deep Bayesian Active Learning with Image Data , 2017, ICML.

[25] Luis Rademacher,et al. Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[26] Yuxiao Hu,et al. MS-Celeb-1M: Challenge of Recognizing One Million Celebrities in the Real World , 2016, IMAWM.

[27] Stephen P. Boyd,et al. Sensor Selection via Convex Optimization , 2009, IEEE Transactions on Signal Processing.

[28] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[29] Mohit Singh,et al. Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design , 2018, SODA.

[30] Li Fei-Fei,et al. VideoSET: Video Summary Evaluation through Text , 2014, ArXiv.

[31] Manfred K. Warmuth,et al. Subsampling for Ridge Regression via Regularized Volume Sampling , 2017, AISTATS.

[32] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[33] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[34] George Atia,et al. Robust Manifold Learning via Conformity Pursuit , 2019, IEEE Signal Processing Letters.

[35] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[36] Junsong Yuan,et al. Representative Selection with Structured Sparsity , 2017, Pattern Recognit..

[37] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[38] Luc Van Gool,et al. Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] C. S. Rudisill,et al. Derivatives of Eigenvalues and Eigenvectors for a General Matrix , 1974 .

[40] Svetlana Lazebnik,et al. Enhancing Video Summarization via Vision-Language Embedding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Santosh S. Vempala,et al. Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[42] G. Golub,et al. Tracking a few extreme singular values and vectors in signal processing , 1990, Proc. IEEE.

[43] P. A. Vijaya,et al. Leaders - Subleaders: An efficient hierarchical clustering algorithm for large data sets , 2004, Pattern Recognit. Lett..

[44] Kristen Grauman,et al. Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[45] Yunhui Liu,et al. Robust Exemplar Extraction Using Structured Sparse Coding , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[46] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Hong Qiao,et al. A Fast Algorithm of Convex Hull Vertices Selection for Online Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.