Semantic Segmentation with Second-Order Pooling

Feature extraction, coding and pooling, are important components on many contemporary object recognition paradigms. In this paper we explore novel pooling techniques that encode the second-order statistics of local descriptors inside a region. To achieve this effect, we introduce multiplicative second-order analogues of average and max-pooling that together with appropriate non-linearities lead to state-of-the-art performance on free-form region recognition, without any type of feature coding. Instead of coding, we found that enriching local descriptors with additional image information leads to large performance gains, especially in conjunction with the proposed pooling methodology. We show that second-order pooling over free-form regions produces results superior to those of the winning systems in the Pascal VOC 2011 semantic segmentation challenge, with models that are 20,000 times faster.

[1]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Matti Pietikäinen,et al.  A Generalized Local Binary Pattern Operator for Multiresolution Gray Scale and Rotation Invariant Texture Classification , 2001, ICAPR.

[3]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Nicholas J. Higham,et al.  A Schur-Parlett Algorithm for Computing Matrix Functions , 2003, SIAM J. Matrix Anal. Appl..

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[8]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[9]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[12]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[13]  Pietro Perona,et al.  Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition , 2007, International Journal of Computer Vision.

[14]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[16]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[17]  R. Bhatia Positive Definite Matrices , 2007 .

[18]  Fatih Murat Porikli,et al.  Human Detection via Classification on Riemannian Manifolds , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[20]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[25]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[26]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[27]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[29]  Joost van de Weijer,et al.  Harmony potentials for joint classification and segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Philip H. S. Torr,et al.  What, Where & How Many? Combining Object Detectors and CRFs , 2010 .

[32]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[33]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[34]  B. Caputo,et al.  A Performance Evaluation of Exact and Approximate Match Kernels for Object Recognition , 2010 .

[35]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[36]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[37]  Cristian Sminchisescu,et al.  Object Recognition by Sequential Figure-Ground Ranking , 2012, International Journal of Computer Vision.

[38]  Jean Ponce,et al.  A graph-matching kernel for object categorization , 2011, 2011 International Conference on Computer Vision.

[39]  Rui Caseiro,et al.  A nonparametric Riemannian framework on tensor field with application to foreground segmentation , 2011, 2011 International Conference on Computer Vision.

[40]  Cristian Sminchisescu,et al.  Probabilistic Joint Image Segmentation and Labeling , 2011, NIPS.

[41]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[42]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[43]  Jitendra Malik,et al.  Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.