Scene image categorization and video event detection using Naive Bayes Nearest Neighbor

We present a detailed study of Naive Bayes Nearest Neighbor (NBNN) proposed by Boiman et al., with application to scene categorization and video event detection. Our study indicates that using Dense-SIFT along with dimensionality reduction using PCA enables NBNN to obtain state-of-the-art results. We demonstrate this on two tasks: (1) scene image categorization on the UIUC 8 Sports Events Image Dataset (obtaining 84.67%) and the MIT 67 Indoor Scene Image Dataset (obtaining 48.84%); and (2) detecting videos depicting certain events of interest on the challenging MED'11 video dataset with only 15 positive training videos per event. We present an extension referred to as sparse-NBNN that constrains the number of training images that can used to match with a given test image for the image-to-class distance computation. Experiments indicate that this improves upon NBNN for handling of imbalanced training data.

[1]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Mubarak Shah,et al.  Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching , 2010, TRECVID.

[3]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  David G. Lowe,et al.  Local Naive Bayes Nearest Neighbor for image classification , 2011, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Nuno Vasconcelos,et al.  Adapted Gaussian models for image classification , 2011, CVPR 2011.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Luciano Sbaiz,et al.  Finding meaning on YouTube: Tag recommendation and category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Grant Schindler,et al.  Internet video category recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Özgür Ulusoy,et al.  Nearest-Neighbor based Metric Functions for indoor scene recognition , 2011, Comput. Vis. Image Underst..

[12]  James M. Rehg,et al.  Movie genre classification via scene categorization , 2010, ACM Multimedia.

[13]  Pradeep Natarajan,et al.  Efficient Orthogonal Matching Pursuit using sparse random projections for scene and video classification , 2011, 2011 International Conference on Computer Vision.

[14]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[15]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[16]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[17]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[18]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[19]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[20]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[21]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Trevor Darrell,et al.  The NBNN kernel , 2011, 2011 International Conference on Computer Vision.

[26]  Baoxin Li,et al.  YouTubeCat: Learning to categorize wild web videos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.