论文信息 - SPLeaP: Soft Pooling of Learned Parts for Image Classification

SPLeaP: Soft Pooling of Learned Parts for Image Classification

The aggregation of image statistics – the so-called pooling step of image classification algorithms – as well as the construction of part-based models, are two distinct and well-studied topics in the literature. The former aims at leveraging a whole set of local descriptors that an image can contain (through spatial pyramids or Fisher vectors for instance) while the latter argues that only a few of the regions an image contains are actually useful for its classification. This paper bridges the two worlds by proposing a new pooling framework based on the discovery of useful parts involved in the pooling of local representations. The key contribution lies in a model integrating a boosted non-linear part classifier as well as a parametric soft-max pooling component, both trained jointly with the image classifier. The experimental validation shows that the proposed model not only consistently surpasses standard pooling approaches but also improves over state-of-the-art part-based models, on several different and challenging classification tasks.

[1] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[2] Ivan Laptev,et al. Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[3] Antonio Torralba,et al. Recognizing indoor scenes , 2009, CVPR.

[4] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[5] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Razvan Pascanu,et al. Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks , 2013, ECML/PKDD.

[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8] Derek Nowrouzezahrai,et al. Learning hatching for pen-and-ink illustration of surfaces , 2012, TOGS.

[9] Andrew Zisserman,et al. Improving Human Action Recognition Using Score Distribution and Ranking , 2014, ACCV.

[10] Michael Felsberg,et al. Coloring Action Recognition in Still Images , 2013, International Journal of Computer Vision.

[11] Neelima Chavali,et al. Object-Proposal Evaluation Protocol is ‘Gameable’ , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Subhransu Maji,et al. Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[14] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[15] C. V. Jawahar,et al. Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Koen E. A. van de Sande,et al. Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[17] Louis Chevallier,et al. Max-Margin, Single-Layer Adaptation of Transferred Image Features , 2016 .

[18] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19] Louis Chevallier,et al. Hybrid multi-layer deep CNN/aggregator feature for image classification , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Alexei A. Efros,et al. Mid-level Visual Element Discovery as Discriminative Mode Seeking , 2013, NIPS.

[21] Cordelia Schmid,et al. Discriminative spatial saliency for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Pietro Perona,et al. Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[23] Cordelia Schmid,et al. Expanded Parts Model for Human Attribute and Action Recognition in Still Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Yao Li,et al. Mid-level deep pattern mining , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] René Vidal,et al. Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[26] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[27] Andrew Zisserman,et al. Automatic Discovery and Optimization of Parts for Image Classification , 2015, ICLR.

[28] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[29] Alexei A. Efros,et al. Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[30] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[31] Alexei A. Efros,et al. What makes Paris look like Paris? , 2015, Commun. ACM.

[32] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Trevor Hastie,et al. Additive Logistic Regression : a Statistical , 1998 .

[36] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] P. Bartlett,et al. Boosting Algorithms as Gradient Descent in Function , 1999 .

[39] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[40] Svetlana Lazebnik,et al. Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[41] Ronan Sicre,et al. Discovering and Aligning Discriminative Mid-level Features for Image Classification , 2014, 2014 22nd International Conference on Pattern Recognition.

[42] Zhuowen Tu,et al. Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[43] Michel Vidal-Naquet,et al. A Fragment-Based Approach to Object Representation and Classification , 2001, IWVF.