Integrating Randomization and Discrimination for Classifying Human-Object Interaction Activities

In this chapter we study the problem of classifying human–object interaction activities in still images. The goal of our method is to explore fine image statistics and identify the discriminative image patches for recognition. We achieve this goal by combining two ideas, discriminative feature mining and randomization. Discriminative feature mining allows us to model the detailed information that distinguishes different classes of images, while randomization allows us to handle the huge feature space and prevent over-fitting. We propose a random forest with discriminative decision trees algorithm where every tree node is a discriminative classifier that is trained by combining the information in this node as well as all upstream nodes. Besides human action recognition in still images, we also evaluate our method on subordinate categorization. Experimental results show that our method identifies semantically meaningful visual information and outperforms state-of-the-art algorithms on various datasets. Using our method, we achieved the best results and won the award in PASCAL VOC action classification challenges in 2011 and 2012.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[5]  Li Fei-Fei,et al.  Classifying Actions and Measuring Action Similarity by Modeling the Mutual Context of Objects and Human Poses , 2011 .

[6]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Fei-Fei Li,et al.  Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Charles A. Collin,et al.  Subordinate-level categorization relies on high spatial frequencies to a greater degree than basic-level categorization , 2005, Perception & psychophysics.

[10]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[11]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[12]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[13]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[15]  Kathy E. Johnson,et al.  Effects of knowledge and development on subordinate level categorization , 1998 .

[16]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[18]  M. I. Jordan Leo Breiman , 2011, 1101.0929.

[19]  Cordelia Schmid,et al.  Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[20]  Geoffrey E. Hinton,et al.  Learning Generative Texture Models with extended Fields-of-Experts , 2009, BMVC.

[21]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[22]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[23]  Jianxiong Xiao,et al.  Memorability of Image Regions , 2012, NIPS.

[24]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[25]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[26]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[27]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[30]  Ivan Laptev,et al.  Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[31]  Laurent Heutte,et al.  On the selection of decision trees in Random Forests , 2009, 2009 International Joint Conference on Neural Networks.

[32]  Zhuowen Tu,et al.  Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[34]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[35]  Shihong Lao,et al.  Boosting Associated Pairing Comparison Features for pedestrian detection , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[36]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Motorcycles Faces Guitars Subordinate class recognition using relational object models , 2006 .

[39]  Fei-Fei Li,et al.  Action Recognition with Exemplar Based 2.5D Graph Matching , 2012, ECCV.

[40]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[41]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[42]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.