Tools for visual scene recognition

Scene recognition is an important step towards a full understanding of an image. This thesis presents novel ideas related to semantic-spatial content capture and local-global feature fusion techniques and applies them for scene recognition. It shows how the proper use of these approaches, without trying to recognize objects in the scene images, can lead to improvement in recognition accuracy for scene classification. First, we propose a method to build a semantic visual vocabulary. The features are extracted from the image patches and the initial vocabulary is constructed by performing k-means clustering on the extracted features and choosing the cluster centers as the visual words. The feature vectors are quantized based on the initial vocabulary to form a wordimage matrix that describes the occurrence of words in the images. The codebooks are then embedded into the concept space by latent semantic models. We demonstrate this embedding using Latent Semantic Analysis (LSA) as well as Probabilistic Latent Semantic Analysis (pLSA). In the proposed space, the distances between words represent the semantic distances, which are used to construct a discriminative and semantically meaningful vocabulary. The main contributions of the first chapter are as follows: 1. Using semantic word space to co-cluster similar words together to form a semantic visual vocabulary. This will improve the results compared to other methods that use document space directly after pLSA embedding. 2. Investigating changes in the number of latent variables. 3. Using LSA embedding when all other vision systems to date only use pLSA. This method has shown promising results on 15-Scene categories when the proposed model extracts one type of visual feature. Second, since fusing local and global features is beneficial for achieving a promising performance for scene categorization systems [7], we propose a novel Local-Global Feature Fusion (LGFF) method with the capability to fuse latent semantic patches adaptively.

[1]  Dennis Gabor,et al.  Communication theory and physics , 1953, Trans. IRE Prof. Group Inf. Theory.

[2]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Konstantinos N. Plataniotis,et al.  Distance measures for color image retrieval , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[4]  Laurent Itti,et al.  Mobile robot vision navigation & localization using Gist and Saliency , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  W. Karush Minima of Functions of Several Variables with Inequalities as Side Conditions , 2014 .

[6]  Nelson H. C. Yung,et al.  Feature fusion within local region using localized maximum-margin learning for scene categorization , 2012, Pattern Recognit..

[7]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[8]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[11]  Yang Yang,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[12]  Raphaël Marée,et al.  Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Aleksandra Mojsilovic,et al.  ISee: perceptual features for image library navigation , 2002, IS&T/SPIE Electronic Imaging.

[14]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Svetlana Lazebnik,et al.  Understanding scenes on many levels , 2011, 2011 International Conference on Computer Vision.

[16]  Stefano Soatto,et al.  Features for recognition: viewpoint invariance for non-planar scenes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[18]  Yangyu Fan,et al.  Fusion of Global and Local Feature Using KCCA for Automatic Target Recognition , 2009, 2009 Fifth International Conference on Image and Graphics.

[19]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[20]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Jean Ponce,et al.  Local, semi-local and global models for texture, object and scene recognition , 2006 .

[23]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Binhai Zhu,et al.  Some Formal Analysis of Rocchio's Similarity-Based Relevance Feedback Algorithm , 2000, Information Retrieval.

[25]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Bernt Schiele,et al.  Natural Scene Retrieval Based on a Semantic Modeling Step , 2004, CIVR.

[27]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[28]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[29]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[30]  Nuno Vasconcelos,et al.  Scene Recognition on the Semantic Manifold , 2012, ECCV.

[31]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[32]  Naphtali Rishe,et al.  Content-based image retrieval , 1995, Multimedia Tools and Applications.

[33]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[34]  B. S. Manjunath,et al.  Edge flow: A framework of boundary detection and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  D. T. Lee,et al.  Boosted Multiple Kernel Learning for Scene Category Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[37]  James Ze Wang,et al.  IRM: integrated region matching for image retrieval , 2000, ACM Multimedia.

[38]  Francesca Odone,et al.  Building kernels from binary strings for image matching , 2005, IEEE Transactions on Image Processing.

[39]  Illah R. Nourbakhsh,et al.  Appearance-based place recognition for topological localization , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[40]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[41]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[42]  N. H. C. Yung,et al.  Scene categorization via contextual visual words , 2010, Pattern Recognit..

[43]  Tsuhan Chen,et al.  Unsupervised Image Categorization and Object Localization using Topic Models and Correspondences between Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[44]  Max A. Viergever,et al.  General intensity transformations and differential invariants , 1994, Journal of Mathematical Imaging and Vision.

[45]  Barbara Caputo,et al.  Visual Servoing to Help Camera Operators Track Better , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Laurent Itti,et al.  Gist: A Mobile Robotics Application of Context-Based Vision in Outdoor Environment , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[47]  Raj Acharya,et al.  Color clustering techniques for color-content-based image retrieval from image databases , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[48]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[49]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[50]  Peter K. Allen,et al.  Topological mobile robot localization using fast vision techniques , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[51]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[52]  James M. Rehg,et al.  Visual place categorization , 2009 .

[53]  Wanqing Li,et al.  Incorporating local and global information using a novel distance function for scene recognition , 2013, 2013 IEEE Workshop on Robot Vision (WORV).

[54]  Jeff A. Bilmes,et al.  Object class recognition using images of abstract regions , 2004, ICPR 2004.

[55]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[57]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[58]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[59]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[60]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[61]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[62]  Jinfeng Yang,et al.  Feature-level fusion of global and local features for finger-vein recognition , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[63]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[65]  Ilaria Bartolini,et al.  Windsurf: region-based image retrieval using wavelets , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[66]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[67]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[68]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[69]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[70]  Sabine Süsstrunk,et al.  Multi-spectral SIFT for scene category recognition , 2011, CVPR 2011.

[71]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[72]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[73]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[74]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[75]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[76]  Jiebo Luo,et al.  Exploiting context for semantic scene classification , 2005 .

[77]  Mubarak Shah,et al.  Scene Modeling Using Co-Clustering , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[78]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[79]  Yan Liu,et al.  A new method of feature fusion and its application in image recognition , 2005, Pattern Recognit..

[80]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[81]  Dorin Comaniciu,et al.  Robust analysis of feature spaces: color image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[82]  William C. Davidon,et al.  Variable Metric Method for Minimization , 1959, SIAM J. Optim..

[83]  Petra Perner,et al.  Prototype-based classification , 2008, Applied Intelligence.

[84]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[85]  Baitao Li Chang,et al.  DPF - a perceptual distance function for image retrieval , 2002, Proceedings. International Conference on Image Processing.

[86]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[87]  Michael G. Strintzis,et al.  An ontology approach to object-based image retrieval , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[88]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[89]  Brijesh Verma,et al.  Fuzzy logic based texture queries for CBIR , 2003, Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003.

[90]  Pedro F. Felzenszwalb,et al.  Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[91]  Ling Shao An Efficient Local Invariant Region Detector for Image Retrieval , 2009, 2009 Canadian Conference on Computer and Robot Vision.

[92]  Tom Drummond,et al.  Fusing points and lines for high performance tracking , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[93]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[94]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[95]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[96]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[97]  Nuno Vasconcelos,et al.  A multiresolution manifold distance for invariant image similarity , 2005, IEEE Transactions on Multimedia.

[98]  Deepu Rajan,et al.  Embedding Visual Words into Concept Space for Action and Scene Recognition , 2010, BMVC.

[99]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[100]  Yi Li,et al.  A generative/discriminative learning algorithm for image classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[101]  Joo-Hwee Lim,et al.  Scene Recognition with Camera Phones for Tourist Information Access , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[102]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[103]  Sabine Süsstrunk,et al.  Eigenregions for image classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .

[105]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[106]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[107]  Andrew E. Johnson,et al.  Recognizing objects by matching oriented points , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[108]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[109]  Toshikazu Kato,et al.  Query by Visual Example - Content based Image Retrieval , 1992, EDBT.

[110]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[111]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[112]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[113]  Nicholas Roy,et al.  Indoor scene recognition through object detection , 2010, 2010 IEEE International Conference on Robotics and Automation.

[114]  Natalia Vassilieva Content-based image retrieval methods , 2009, Programming and Computer Software.

[115]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.

[116]  W. Clem Karl,et al.  A curve evolution approach for image segmentation using adaptive flows , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[117]  Shih-Fu Chang,et al.  A knowledge engineering approach for image classification based on probabilistic reasoning systems , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[118]  Bo Zhang,et al.  Learning in Region-Based Image Retrieval , 2003, CIVR.

[119]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[120]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[121]  Xin Li,et al.  An Object Co-occurrence Assisted Hierarchical Model for Scene Understanding , 2012, BMVC.

[122]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[123]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[124]  Jianxin Wu,et al.  Power mean SVM for large scale visual classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[125]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.