Sparse Over-complete Patch Matching

Image patch matching, which is the process of identifying corresponding patches across images, has been used as a subroutine for many computer vision and image processing tasks. State -of-the-art patch matching techniques take image patches as input to a convolutional neural network to extract the patch features and evaluate their similarity. Our aim in this paper is to improve on the state of the art patch matching techniques by observing the fact that a sparse-overcomplete representation of an image posses statistical properties of natural visual scenes which can be exploited for patch matching. We propose a new paradigm which encodes image patch details by encoding the patch and subsequently using this sparse representation as input to a neural network to compare the patches. As sparse coding is based on a generative model of natural image patches, it can represent the patch in terms of the fundamental visual components from which it has been composed of, leading to similar sparse codes for patches which are built from similar components. Once the sparse coded features are extracted, we employ a fully-connected neural network, which captures the non-linear relationships between features, for comparison. We have evaluated our approach using the Liberty and Notredame subsets of the popular UBC patch dataset and set a new benchmark outperforming all state-of-the-art patch matching techniques for these datasets.

[1]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[2]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[7]  Keiron O'Shea,et al.  An Introduction to Convolutional Neural Networks , 2015, ArXiv.

[8]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Li Shang Image Reconstruction Using a Modified Sparse Coding Technique , 2008, ICIC.

[10]  Hendrik Purwins,et al.  Sparse Approximations for Drum Sound Classification , 2011, IEEE Journal of Selected Topics in Signal Processing.

[11]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Vivek K. Goyal,et al.  Comparison of Three Algorithms for Solving Linearized Systems of Parallel Excitation RF Waveform Design Equations: Experiments on an Eight-Channel System at 3 Tesla , 2007 .

[13]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[14]  Paul A. Viola,et al.  Learning silhouette features for control of human motion , 2004, SIGGRAPH '04.

[15]  Hujun Bao,et al.  Sparse concept coding for visual analysis , 2011, CVPR 2011.

[16]  Neil D. B. Bruce,et al.  Sparse coding in early visual representation: From specific properties to general principles , 2016, Neurocomputing.

[17]  Marina Meila,et al.  Improved Graph Laplacian via Geometric Self-Consistency , 2014, NIPS.

[18]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Quoc V. Le A Tutorial on Deep Learning Part 2: Autoencoders, Convolutional Neural Networks and Recurrent Neural Networks , 2015 .

[21]  Ian D. Reid,et al.  Locally Planar Patch Features for Real-Time Structure from Motion , 2004, BMVC.

[22]  Shimon Ullman,et al.  Class-Based Matching of Object Parts , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[23]  J V Tu,et al.  Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. , 1996, Journal of clinical epidemiology.

[24]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Å. Björck,et al.  Accelerated projection methods for computing pseudoinverse solutions of systems of linear equations , 1979 .

[26]  T. Breuel,et al.  Electronic Letters on Computer Vision and Image Analysis 6(1):44-54, 2007 Optimal Geometric Matching for Patch-Based Object Detection , 2006 .

[27]  D. Ballard,et al.  Object recognition using steerable filters at multiple scales , 1993, [1993] Proceedings IEEE Workshop on Qualitative Vision.

[28]  Aapo Hyvärinen,et al.  A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images , 2001, Vision Research.

[29]  Klaus Obermayer,et al.  A Graphical Representation and Dissimilarity Measure for Basic Everyday Sound Events , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[31]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[33]  Allen R. Tannenbaum,et al.  A new distance measure based on generalized Image Normalized Cross-Correlation for robust video tracking and image recognition , 2013, Pattern Recognit. Lett..

[34]  Frédéric Jurie,et al.  Learning Visual Similarity Measures for Comparing Never Seen Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[36]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[37]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[38]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[40]  Trevor Darrell,et al.  Heavy-tailed Distances for Gradient Based Image Descriptors , 2011, NIPS.

[41]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[42]  Vincent Lepetit,et al.  Learning Image Descriptors with the Boosting-Trick , 2012, NIPS.

[43]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Matthew A. Brown,et al.  Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[45]  Hongtao Lu,et al.  Adaptive affinity matrix for unsupervised metric learning , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).