Learning-Based Local Visual Representation and Indexing

Learning-Based Local Visual Representation and Indexing, reviews the state-of-the-art in visual content representation and indexing, introduces cutting-edge techniques in learning based visual representation, and discusses emerging topics in visual local representation, and introduces the most recent advances in content-based visual search techniques. Discusses state-of-the-art procedures in learning-based local visual representation. Shows how to master the basic techniques needed for building a large-scale visual search engine and indexing system Provides insight into how machine learning techniques can be leveraged to refine the visual recognition system, especially in the part of visual feature representation.

[1]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[2]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[5]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Cordelia Schmid,et al.  A contextual dissimilarity measure for accurate and efficient image search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Wen Gao,et al.  Towards compact topical descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[10]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[11]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Hui Xiong,et al.  Discovering colocation patterns from spatial data sets: a general approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[19]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[20]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[22]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[23]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Luc Van Gool,et al.  Efficient Mining of Frequent and Distinctive Feature Configurations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[26]  Andrew Zisserman,et al.  Video data mining using configurations of viewpoint invariant regions , 2004, CVPR 2004.

[27]  Cordelia Schmid,et al.  Comparing and evaluating interest points , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[28]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[29]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[30]  Song-Chun Zhu,et al.  Minimax Entropy Principle and Its Application to Texture Modeling , 1997, Neural Computation.

[31]  David G. Lowe,et al.  Indexing without Invariants in 3D Object Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[33]  Maosong Sun,et al.  Automatic Image Annotation Based on WordNet and Hierarchical Ensembles , 2006, CICLing.

[34]  Hua Li,et al.  Mobile Search With Multimodal Queries , 2008, Proceedings of the IEEE.

[35]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[36]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[38]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[39]  Sameer A. Nene,et al.  A simple algorithm for nearest neighbor search in high dimensions , 1997 .

[40]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[41]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[42]  Yang Yang,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[43]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[44]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[45]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[47]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[48]  Ehud Rivlin,et al.  Scale space semi-local invariants , 1997, Image Vis. Comput..

[49]  Lior Wolf,et al.  Image representations beyond histograms of gradients: The role of Gestalt descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[51]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[52]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Cordelia Schmid,et al.  Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Trevor Darrell,et al.  Approximate Correspondences in High Dimensions , 2006, NIPS.

[56]  Lei Wang Toward A Discriminative Codebook: Codeword Selection across Multi-resolution , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2012, International Journal of Computer Vision.

[58]  M. Gazzaniga,et al.  Cognitive Neuroscience: The Biology of the Mind , 1998 .

[59]  Sven J. Dickinson,et al.  Using Language to Drive the Perceptual Grouping of Local Image Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[60]  B. S. Manjunath,et al.  An axiomatic approach to corner detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[61]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[62]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[63]  Karl Rohr,et al.  Localization properties of direct corner detectors , 1994, Journal of Mathematical Imaging and Vision.

[64]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[65]  Gang Hua,et al.  Discriminant Embedding for Local Image Descriptors , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[66]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[67]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[68]  Hans P. Moravec Obstacle avoidance and navigation in the real world by a seeing robot rover , 1980 .

[69]  Baining Guo,et al.  Real-time texture synthesis by patch-based sampling , 2001, TOGS.

[70]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[71]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[72]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[73]  Richard Szeliski,et al.  Multi-image matching using multi-scale oriented patches , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[74]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[75]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[76]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[77]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[78]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[79]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[80]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[81]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[83]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[84]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[85]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[86]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[87]  Murat Dundar,et al.  Joint Optimization of Cascaded Classifiers for Computer Aided Detection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[88]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[89]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[90]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[91]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[92]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[93]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[94]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[95]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[96]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[97]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[98]  Svetlana Lazebnik,et al.  Supervised Learning of Quantizer Codebooks by Information Loss Minimization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[99]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[100]  Kenneth Rose,et al.  A generalized VQ method for combined compression and estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[101]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[102]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[103]  Lucas Paletta,et al.  Q-learning of sequential attention for visual object recognition from informative local descriptors , 2005, ICML.

[104]  Antonio Torralba,et al.  Contextual Models for Object Detection Using Boosted Random Fields , 2004, NIPS.

[105]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[106]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[107]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[108]  Glen P. Abousleman,et al.  Dynamic point selection in image mosaicking , 2006 .

[109]  Luc Van Gool,et al.  Video mining with frequent itemset configurations , 2006 .

[110]  Luc Van Gool,et al.  Fast indexing for image retrieval based on local appearance with re-ranking , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[111]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[112]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .