Exploring visual dictionaries: A model driven perspective

Abstract Good representative dictionaries is the most critical part of the BoVW: Bag of Visual Words scheme, used for such tasks as category identification. The paradigm of learning dictionaries from datasets is by far the most widely used approach and there exists a plethora of methods to this effect. Dictionary learning methods demand abundant data, and when the amount of training data is limited, the quality of dictionaries and consequently the performance of BoVW methods suffer. A much less explored path for creating visual dictionaries starts from the knowledge of primitives in appearance models and creates families of parametric shape models. In this work, we develop shape models starting from a small number of primitives and develop a visual dictionary using various nonlinear operations and nonlinear combinations. Compared with the existing model-driven schemes, our method is able to represent and characterize images in various image understanding applications with competitive, and often better performance.

[1]  Stéphane Mallat,et al.  Sparse geometric image representations with bandelets , 2005, IEEE Transactions on Image Processing.

[2]  D. Geman,et al.  Invariant Statistics and Coding of Natural Microimages , 1998 .

[3]  Laurent Demanet,et al.  Fast Discrete Curvelet Transforms , 2006, Multiscale Model. Simul..

[4]  Ling Shao,et al.  Submodular Object Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Vincent Lepetit,et al.  BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Raphaël Marée,et al.  Content-based Image Retrieval by Indexing Random Subwindows with Randomized Trees , 2007, IPSJ Trans. Comput. Vis. Appl..

[7]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  Svetha Venkatesh,et al.  Joint learning and dictionary construction for pattern recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[12]  Luc Van Gool,et al.  HPAT Indexing for Fast Object/Scene Recognition Based on Local Appearance , 2003, CIVR.

[13]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[14]  Laurent Itti,et al.  A Bayesian model for efficient visual search and recognition , 2010, Vision Research.

[15]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Lewis D. Griffin,et al.  Segmentation of phase contrast microscopy images based on multi-scale local Basic Image Features histograms , 2017, Comput. methods Biomech. Biomed. Eng. Imaging Vis..

[17]  Nazli Ikizler-Cinbis,et al.  Low-level features for visual attribute recognition: An evaluation , 2016, Pattern Recognit. Lett..

[18]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[20]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[21]  Lewis D. Griffin,et al.  Feature category systems for 2nd order local image structure induced by natural image statistics and otherwise , 2007, Electronic Imaging.

[22]  H. K. Hartline,et al.  THE RESPONSE OF SINGLE OPTIC NERVE FIBERS OF THE VERTEBRATE EYE TO ILLUMINATION OF THE RETINA , 1938 .

[23]  P. Lennie Receptive fields , 2003, Current Biology.

[24]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[25]  S. Kotsiantis,et al.  Discretization Techniques: A recent survey , 2006 .

[26]  Aleix M. Martínez,et al.  Rotation Invariant Kernels and Their Application to Shape Analysis , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  A. Witkin,et al.  On the Role of Structure in Vision , 1983 .

[28]  Tieniu Tan,et al.  Feature Coding in Image Classification: A Comprehensive Study , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  S. Lawson,et al.  Image compression using wavelets and JPEG2000: a tutorial , 2002 .

[30]  Arnold W. M. Smeulders,et al.  The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[31]  Cristian Sminchisescu,et al.  Object recognition as ranking holistic figure-ground hypotheses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Radu Horaud,et al.  Stereo Correspondence Through Feature Grouping and Maximal Cliques , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  D Marr,et al.  Early processing of visual information. , 1976, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[34]  D. Lowe,et al.  Fast Matching of Binary Features , 2012, 2012 Ninth Conference on Computer and Robot Vision.

[35]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[36]  Radu Horaud,et al.  Finding Geometric and Relational Structures in an Image , 1990, ECCV.

[37]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[38]  Qinghua Hu,et al.  Salience based hierarchical fuzzy representation for object recognition , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[39]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[40]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[41]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[43]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[44]  Zhijun Zhang,et al.  Robust Registration of 3-D Ultrasound Images Based on Gabor Filter and Mean-Shift Method , 2004, ECCV Workshops CVAMIA and MMBIA.

[45]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[46]  Ramakant Nevatia,et al.  Structural Analysis of Natural Textures , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Pascal Frossard,et al.  Low-rate and flexible image coding with redundant representations , 2006, IEEE Transactions on Image Processing.

[48]  Xiaoming Huo,et al.  Sparse image representation via combined transforms , 1999 .

[49]  Jitendra Malik,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Jing Li,et al.  A comprehensive review of current local features for computer vision , 2008, Neurocomputing.

[51]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[52]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[53]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Matthieu Cord,et al.  Bag-of-Words Image Representation: Key Ideas and Further Insight , 2014, Fusion in Computer Vision.

[56]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Raphaël Marée,et al.  A generic approach for image classification based on decision tree ensembles and local sub-windows , 2004 .

[58]  Ann B. Lee,et al.  The Complex Statistics of High-Contrast Patches in Natural Images , 2001 .

[59]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[60]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[61]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[62]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[63]  Xiaoming Huo,et al.  Combined image representation using edgelets and wavelets , 1999, Optics & Photonics.

[64]  Michael Elad,et al.  Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[65]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[66]  Lewis D. Griffin,et al.  Trainable Segmentation of Phase Contrast Microscopy Images Based on Local Basic Image Features Histograms , 2014, MIUA.

[67]  Eric Saund,et al.  Symbolic Construction of a 2-D Scale-Space Image , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[68]  Lei Wang Toward A Discriminative Codebook: Codeword Selection across Multi-resolution , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[70]  Stefano Soatto,et al.  Localizing Objects with Smart Dictionaries , 2008, ECCV.

[71]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[72]  Lewis D. Griffin,et al.  Using Basic Image Features for Texture Classification , 2010, International Journal of Computer Vision.

[73]  Allan Hanbury The morphological top-hat operator generalised to multi-channel images , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[74]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[75]  Lewis D. Griffin,et al.  Writer identification using oriented Basic Image Features and the Delta encoding , 2014, Pattern Recognit..

[76]  Xin Yuan,et al.  A Deep Generative Deconvolutional Image Model , 2015, AISTATS.

[77]  Andrew Zisserman,et al.  Classifying Images of Materials: Achieving Viewpoint and Illumination Independence , 2002, ECCV.

[78]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[79]  Teemu Kinnunen,et al.  Bag-of-Features Approach to Unsupervised Visual Object Categorisation , 2011 .

[80]  Luc Van Gool,et al.  Fast wide baseline matching for visual navigation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[81]  Rene F. Swarttouw,et al.  The Askey-scheme of hypergeometric orthogonal polynomials and its q-analogue Report Fac , 1996, math/9602214.

[82]  Lewis D. Griffin,et al.  Natural Image Character Recognition Using Oriented Basic Image Features , 2011, 2011 International Conference on Digital Image Computing: Techniques and Applications.

[83]  Hermann Ney,et al.  Classification error rate for quantitative evaluation of content-based image retrieval systems , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[84]  Michel Verleysen,et al.  Mutual information-based feature selection for multilabel classification , 2013, Neurocomputing.

[85]  Naila Murray,et al.  Generalized Max Pooling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[87]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[88]  Hongdong Li,et al.  A Framework for Shape Analysis via Hilbert Space Embedding , 2013, 2013 IEEE International Conference on Computer Vision.

[89]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[90]  Christoph Schnörr,et al.  Natural Image Statistics for Natural Image Segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[91]  David L. Donoho,et al.  Orthonormal Ridgelets and Linear Singularities , 2000, SIAM J. Math. Anal..

[92]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[93]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[95]  Stepán Obdrzálek,et al.  Object Recognition using Local Affine Frames on Distinguished Regions , 2002, BMVC.

[96]  Minh N. Do,et al.  Ieee Transactions on Image Processing the Contourlet Transform: an Efficient Directional Multiresolution Image Representation , 2022 .

[97]  Lewis D. Griffin,et al.  Automated Texture Recognition of Quartz Sand Grains for Forensic Applications * , 2012, Journal of forensic sciences.

[98]  Lewis D. Griffin,et al.  Novel image feature alphabets for object recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[99]  E. Nowak,et al.  Vehicle Categorization: Parts for Speed and Accuracy , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[100]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.

[101]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[102]  J. Daugman Two-dimensional spectral analysis of cortical receptive field profiles , 1980, Vision Research.

[103]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[104]  C. A. Murthy,et al.  Distinct Multicolored Region Descriptors for Object Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[105]  Paolo Napoletano,et al.  Combining multiple features for color texture classification , 2016, J. Electronic Imaging.

[106]  Jianfang Dou,et al.  Modeling the background and detecting moving objects based on Sift flow , 2014 .

[107]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[108]  Luc Van Gool,et al.  Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[109]  David G. Lowe,et al.  Perceptual Organization and Visual Recognition , 2012 .

[110]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[111]  B. Julesz Textons, the elements of texture perception, and their interactions , 1981, Nature.

[112]  Lewis D. Griffin The Second Order Local-Image-Structure Solid , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[113]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[114]  H. Barlow Summation and inhibition in the frog's retina , 1953, The Journal of physiology.

[115]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[116]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[117]  Narendra Ahuja,et al.  Learning to Recognize 3D Objects with SNoW , 2000, ECCV.

[118]  Stepán Obdrzálek,et al.  Image Retrieval Using Local Compact DCT-Based Representation , 2003, DAGM-Symposium.

[119]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[120]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[121]  Nanning Zheng,et al.  Image hallucination with primal sketch priors , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[122]  Lewis D. Griffin,et al.  Basic Image Features (BIFs) Arising from Approximate Symmetry Type , 2009, SSVM.

[123]  Song-Chun Zhu,et al.  Primal sketch: Integrating structure and texture , 2007, Comput. Vis. Image Underst..

[124]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[125]  Michael J. Morgan,et al.  Features and the ‘primal sketch’ , 2011, Vision Research.

[126]  Lei Zhang,et al.  Metaface learning for sparse representation based face recognition , 2010, 2010 IEEE International Conference on Image Processing.

[127]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.