Summarization and Search Over Geometric Spaces

The last decade has seen an explosion in the amount of data being generated, in part due to the prevalence of image and video sensors. As a result, searching through these data for relevant information or even getting a gist of the data is increasingly becoming difficult. The task is further complicated when the data have a non-Euclidean geometric interpretation. In this chapter, we address these limitations by discussing techniques to (a) summarize the data and (b) search through the data to find the nearest neighbor, in the general case of data lying on non-Euclidean manifolds. First, we consider the “precis” problem of sampling K representative yet diverse data points from a large data set. We formulate a general theory which encompasses not only traditional techniques devised for vector spaces but also non-Euclidean manifolds, thereby enabling these techniques for shapes, human activities, textures, and many other image and video-based data sets. We discuss the intrinsic manifold measures for measuring the quality of a selection of points with respect to their representative power, and their diversity. We also extend our formulation to the infinite-dimensional manifolds. We then address the problem of nearest-neighbor search in curved spaces. Towards this end, we discuss geodesic hashing which employs intrinsic geodesic-based functions to hash the data for realizing approximate but fast nearest-neighbor retrieval. The proposed family of hashing functions, although intrinsic, is optimally selected to empirically satisfy the locality sensitive hashing property.

[1]  Alain Trouvé,et al.  Diffeomorphisms Groups and Pattern Matching in Image Analysis , 1998, International Journal of Computer Vision.

[2]  Yuri Ivanov,et al.  Fast Approximate Nearest Neighbor Methods for Non-Euclidean Manifolds with Applications to Human Activity Analysis in Videos , 2010, ECCV.

[3]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[4]  Anuj Srivastava,et al.  Statistical shape analysis: clustering, learning, and testing , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Peter Meer,et al.  Nonlinear Mean Shift for Clustering over Analytic Manifolds , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[7]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[8]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[9]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[10]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[11]  K.A. Gallivan,et al.  Efficient algorithms for inferences on Grassmann manifolds , 2004, IEEE Workshop on Statistical Signal Processing, 2003.

[12]  Luis Rademacher,et al.  Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[13]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[14]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[15]  Rama Chellappa,et al.  Nearest-neighbor search algorithms on non-Euclidean manifolds for computer vision applications , 2010, ICVGIP '10.

[16]  Rama Chellappa,et al.  Video Précis: Highlighting Diverse Aspects of Videos , 2010, IEEE Transactions on Multimedia.

[17]  Rama Chellappa,et al.  Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ulrich Eckhardt,et al.  Shape descriptors for non-rigid shapes with a single closed contour , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[19]  Anuj Srivastava,et al.  Statistical analysis of trajectories on Riemannian manifolds: Bird migration, hurricane tracking and video surveillance , 2014, 1405.0803.

[20]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[21]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[22]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[23]  Lihi Zelnik-Manor,et al.  Approximate Nearest Subspace Search with Applications to Pattern Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Kaizhong Zhang,et al.  An Index Structure for Data Mining and Clustering , 2000, Knowledge and Information Systems.

[26]  Anuj Srivastava,et al.  Shape Analysis of Elastic Curves in Euclidean Spaces , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[28]  René Vidal,et al.  Clustering and dimensionality reduction on Riemannian manifolds , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[30]  Evimaria Terzi,et al.  ManyAspects: a system for highlighting diverse concepts in documents , 2008, Proc. VLDB Endow..

[31]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[32]  Hongbin Zha,et al.  Riemannian Manifold Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Gene H. Golub,et al.  Numerical methods for solving linear least squares problems , 1965, Milestones in Matrix Computation.

[34]  Michael Werman,et al.  Affine Invariance Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  F. Gianfelici,et al.  Nearest-Neighbor Methods in Learning and Vision (Shakhnarovich, G. et al., Eds.; 2006) [Book review] , 2008 .

[36]  Panagiotis Papapetrou,et al.  Nearest Neighbor Retrieval Using Distance-Based Hashing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[37]  T. Chan Rank revealing QR factorizations , 1987 .

[38]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[39]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[40]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[41]  Shree K. Nayar,et al.  What Is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images? , 2008, ECCV.

[42]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[43]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[44]  Anuj Srivastava,et al.  On Shape of Plane Elastic Curves , 2007, International Journal of Computer Vision.

[45]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .