Data fusion through cross-modality metric learning using similarity-sensitive hashing

Visual understanding is often based on measuring similarity between observations. Learning similarities specific to a certain perception task from a set of examples has been shown advantageous in various computer vision and pattern recognition problems. In many important applications, the data that one needs to compare come from different representations or modalities, and the similarity between such data operates on objects that may have different and often incommensurable structure and dimensionality. In this paper, we propose a framework for supervised similarity learning based on embedding the input data from two arbitrary spaces into the Hamming space. The mapping is expressed as a binary classification problem with positive and negative examples, and can be efficiently learned using boosting algorithms. The utility and efficiency of such a generic approach is demonstrated on several challenging applications including cross-representation shape retrieval and alignment of multi-modal medical images.

[1]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[3]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[4]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[5]  Nassir Navab,et al.  Dense image registration through MRFs and efficient linear programming , 2008, Medical Image Anal..

[6]  Alexander M. Bronstein,et al.  Numerical Geometry of Non-Rigid Shapes , 2009, Monographs in Computer Science.

[7]  A. Bronstein,et al.  Shape Google : a computer vision approach to invariant shape retrieval , 2009 .

[8]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[9]  Prateek Jain,et al.  Fast image search for learned metrics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Sergio Bermejo,et al.  Large Margin Nearest Neighbor Classifiers , 2001, IWANN.

[11]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[12]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[14]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[15]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[16]  Bernhard Schölkopf,et al.  Learning similarity measure for multi-modal 3D image registration , 2009, CVPR 2009.

[17]  Alexander M. Bronstein,et al.  The Video Genome , 2010, ArXiv.

[18]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[19]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[20]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Nikos Komodakis,et al.  Performance vs computational efficiency for optimizing single and dynamic MRFs: Setting the state of the art with primal-dual strategies , 2008, Comput. Vis. Image Underst..

[22]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  William M. Wells,et al.  Multi-modal image registration by minimizing Kullback-Leibler distance between expected and observed joint class histograms , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[24]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[25]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[26]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[27]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[28]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[29]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[30]  Alexander M. Bronstein,et al.  Topologically Constrained Isometric Embedding , 2006, Human Motion.

[31]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Andrea Fusiello,et al.  Visual Vocabulary Signature for 3D Object Retrieval and Partial Matching , 2009, 3DOR@Eurographics.

[33]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Gregory Shakhnarovich,et al.  Learning task-specific similarity , 2005 .

[35]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[36]  Jovan Popović,et al.  Deformation transfer for triangle meshes , 2004, SIGGRAPH 2004.

[37]  Leonidas J. Guibas,et al.  Probabilistic fingerprints for shapes , 2006, SGP '06.

[38]  G. Marchal,et al.  Multi-modal volume registration by maximization of mutual information , 1997 .

[39]  Leonidas J. Guibas,et al.  Shape Google: a computer vision approach to isometry invariant shape retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[40]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.