Manifold Precis: An Annealing Technique for Diverse Sampling of Manifolds

In this paper, we consider the Precis problem of sampling K representative yet diverse data points from a large dataset. This problem arises frequently in applications such as video and document summarization, exploratory data analysis, and pre-filtering. We formulate a general theory which encompasses not just traditional techniques devised for vector spaces, but also non-Euclidean manifolds, thereby enabling these techniques to shapes, human activities, textures and many other image and video based datasets. We propose intrinsic manifold measures for measuring the quality of a selection of points with respect to their representative power, and their diversity. We then propose efficient algorithms to optimize the cost function using a novel annealing-based iterative alternation algorithm. The proposed formulation is applicable to manifolds of known geometry as well as to manifolds whose geometry needs to be estimated from samples. Experimental results show the strength and generality of the proposed approach.

[1]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[2]  Hongbin Zha,et al.  Riemannian Manifold Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Anuj Srivastava,et al.  On Shape of Plane Elastic Curves , 2007, International Journal of Computer Vision.

[4]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[5]  Rama Chellappa,et al.  Video Précis: Highlighting Diverse Aspects of Videos , 2010, IEEE Transactions on Multimedia.

[6]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[7]  Gregory D. Hager,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, CVPR.

[8]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[9]  A. Winsor Sampling techniques. , 2000, Nursing times.

[10]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[11]  Anuj Srivastava,et al.  Statistical shape analysis: clustering, learning, and testing , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Peter Meer,et al.  Nonlinear Mean Shift for Clustering over Analytic Manifolds , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Ulrich Eckhardt,et al.  Shape descriptors for non-rigid shapes with a single closed contour , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[14]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[15]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[16]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[18]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[19]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[20]  Josef Stoer,et al.  Numerische Mathematik 1 , 1989 .

[21]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..

[22]  D. Kendall SHAPE MANIFOLDS, PROCRUSTEAN METRICS, AND COMPLEX PROJECTIVE SPACES , 1984 .

[23]  Luis Rademacher,et al.  Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[24]  GuMing,et al.  Efficient algorithms for computing a strong rank-revealing QR factorization , 1996 .

[25]  Alain Trouvé,et al.  Diffeomorphisms Groups and Pattern Matching in Image Analysis , 1998, International Journal of Computer Vision.

[26]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[27]  K.A. Gallivan,et al.  Efficient algorithms for inferences on Grassmann manifolds , 2004, IEEE Workshop on Statistical Signal Processing, 2003.

[28]  René Vidal,et al.  Clustering and dimensionality reduction on Riemannian manifolds , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Evimaria Terzi,et al.  ManyAspects: a system for highlighting diverse concepts in documents , 2008, Proc. VLDB Endow..

[30]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[31]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[32]  Gene H. Golub,et al.  Numerical methods for solving linear least squares problems , 1965, Milestones in Matrix Computation.

[33]  Guojun Gan,et al.  Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability) , 2007 .

[34]  Michael Werman,et al.  Affine Invariance Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).