The Pascal Visual Object Classes Challenge: A Retrospective

The Pascal Visual Object Classes (VOC) challenge consists of two components: (i) a publicly available dataset of images together with ground truth annotation and standardised evaluation software; and (ii) an annual competition and workshop. There are five challenges: classification, detection, segmentation, action classification, and person layout. In this paper we provide a review of the challenge from 2008–2012. The paper is intended for two audiences: algorithm designers, researchers who want to see what the state of the art is, as measured by performance on the VOC datasets, along with the limitations and weak points of the current generation of algorithms; and, challenge designers, who want to see what we as organisers have learnt from the process and our recommendations for the organisation of future challenges. To analyse the performance of submitted algorithms on the VOC datasets we introduce a number of novel evaluation methods: a bootstrapping method for determining whether differences in the performance of two algorithms are significant or not; a normalised average precision so that performance can be compared across classes with different proportions of positive instances; a clustering method for visualising the performance across multiple algorithms so that the hard and easy images can be identified; and the use of a joint classifier over the submitted algorithms in order to measure their complementarity and combined performance. We also analyse the community’s progress through time using the methods of Hoiem et al. (Proceedings of European Conference on Computer Vision, 2012) to identify the types of occurring errors. We conclude the paper with an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

[1]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[2]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[3]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[4]  Rob J Hyndman,et al.  Nonparametric confidence intervals for receiver operating characteristic curves , 2004 .

[5]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[6]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[7]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[8]  Larry Wasserman,et al.  All of Statistics , 2004 .

[9]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[12]  Stéphan Clémençon,et al.  On Bootstrapping the ROC Curve , 2008, NIPS.

[13]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[15]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[21]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[24]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[25]  Cristian Sminchisescu,et al.  Probabilistic Joint Image Segmentation and Labeling , 2011, NIPS.

[26]  Cristian Sminchisescu,et al.  Image segmentation by figure-ground composition into maximal cliques , 2011, 2011 International Conference on Computer Vision.

[27]  Jan C. van Gemert,et al.  Exploiting photographic style for category-level image classification by generalizing the spatial pyramid , 2011, ICMR.

[28]  Fahad Shahbaz Khan,et al.  Modulating Shape Features by Color Attention for Object Recognition , 2012, International Journal of Computer Vision.

[29]  Fahad Shahbaz Khan,et al.  Color attributes for object detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Anil A. Bharath,et al.  Efficient Kernels Couple Visual Words Through Categorical Opponency , 2012, BMVC.

[31]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[32]  N. Sudhakaran,et al.  Best practice guidelines: fetal surgery. , 2012, Early human development.

[33]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[34]  Andrew W. Fitzgibbon,et al.  In Memoriam: Mark Everingham , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Theo Gevers,et al.  Object Reading: Text Recognition for Object Recognition , 2012, ECCV Workshops.

[36]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[37]  Loong Fah Cheong,et al.  Segmentation over Detection by Coupled Global and Local Sparse Representations , 2012, ECCV.

[38]  Nazli Ikizler-Cinbis,et al.  On Recognizing Actions in Still Images via Multiple Features , 2012, ECCV Workshops.

[39]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40]  Shuicheng Yan,et al.  Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[42]  Cristian Sminchisescu,et al.  Composite Statistical Inference for Semantic Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Loris Nanni,et al.  Heterogeneous bag-of-features for object/scene recognition , 2013, Appl. Soft Comput..

[44]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[45]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[47]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Qiang Chen,et al.  Contextualizing Object Detection and Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Wallace S. Rutkowski,et al.  TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2022 .