Where Next in Object Recognition and how much Supervision Do We Need?

Object class recognition is an active topic in computer vision still presenting many challenges. In most approaches, this task is addressed by supervised learning algorithms that need a large quantity of labels to perform well. This leads either to small datasets (<10,000 images) that capture only a subset of the real-world class distribution (but with a controlled and verified labeling procedure), or to large datasets that are more representative but also add more label noise. Therefore, semi-supervised learning has been established as a promising direction to address object recognition. It requires only few labels while simultaneously making use of the vast amount of images available today. In this chapter, we outline the main challenges of semi-supervised object recognition, we review existing approaches, and we emphasize open issues that should be addressed next to advance this research topic.

[1]  Horst Bischof,et al.  On-line inverse multiple instance boosting for classifier grids , 2012, Pattern Recognit. Lett..

[2]  Bernt Schiele,et al.  Transinformation for active object recognition , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[3]  Horst Bischof,et al.  On-Line, Incremental Learning of a Robust Active Shape Model , 2006, DAGM-Symposium.

[4]  Christian Bauckhage,et al.  Making Archetypal Analysis Practical , 2009, DAGM-Symposium.

[5]  R. Nosofsky American Psychological Association, Inc. Choice, Similarity, and the Context Theory of Classification , 2022 .

[6]  A. Campbell,et al.  Progress in Artificial Intelligence , 1995, Lecture Notes in Computer Science.

[7]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[8]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[9]  M. Pazzani Influence of prior knowledge on concept acquisition: Experimental and computational results. , 1991 .

[10]  François Fleuret,et al.  Tasting families of features for image classification , 2011, 2011 International Conference on Computer Vision.

[11]  Trevor Darrell,et al.  An Additive Latent Feature Model for Transparent Object Recognition , 2009, NIPS.

[12]  Zhi-Hua Zhou,et al.  Improving Semi-Supervised Support Vector Machines Through Unlabeled Instances Selection , 2010, AAAI.

[13]  D. Simons,et al.  Failure to detect changes to people during a real-world interaction , 1998 .

[14]  Sven J. Dickinson,et al.  A Research Roadmap of Cognitive Vision , 2005 .

[15]  Bernt Schiele Towards Automatic Extraction and Modeling of Objects from Image Sequences , 2000 .

[16]  Ge Yu,et al.  Efficiently Indexing Large Sparse Graphs for Similarity Search , 2012, IEEE Transactions on Knowledge and Data Engineering.

[17]  Prateek Jain,et al.  Fast Similarity Search for Learned Metrics , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[19]  Bernt Schiele,et al.  Discriminative structure learning of hierarchical representations for object detection , 2009, CVPR.

[20]  Alex Pentland,et al.  Probabilistic object recognition and localization , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[21]  Ivor W. Tsang,et al.  Large-Scale Sparsified Manifold Regularization , 2006, NIPS.

[22]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[24]  Hamid R. Rabiee,et al.  Manifold Coarse Graining for Online Semi-supervised Learning , 2011, ECML/PKDD.

[25]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[27]  Xiaojin Zhu,et al.  Some new directions in graph-based semi-supervised learning , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[28]  C. Chabris,et al.  Gorillas in Our Midst: Sustained Inattentional Blindness for Dynamic Events , 1999, Perception.

[29]  Maria-Florina Balcan,et al.  Person Identification in Webcam Images: An Application of Semi-Supervised Learning , 2005 .

[30]  Stephen J. Wright,et al.  Dissimilarity in Graph-Based Semi-Supervised Classification , 2007, AISTATS.

[31]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[32]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[33]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[34]  David G. Lowe,et al.  Learning Appearance Models for Object Recognition , 1996, Object Representation in Computer Vision.

[35]  Martial Hebert,et al.  Object Representation in Computer Vision , 1994, Lecture Notes in Computer Science.

[36]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[37]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[38]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  William T. Freeman,et al.  Where computer vision needs help from computer science , 2011, SODA '11.

[41]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[42]  Timothy F. Cootes,et al.  Face recognition using the active appearance model. , 1998, European Conference on Computer Vision.

[43]  Lei Wang,et al.  Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[44]  Antonio Torralba,et al.  LabelMe: Online Image Annotation and Applications , 2010, Proceedings of the IEEE.

[45]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[46]  Bernt Schiele,et al.  Pick Your Neighborhood - Improving Labels and Neighborhood Structure for Label Propagation , 2011, DAGM-Symposium.

[47]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Edward E. Smith,et al.  On the adequacy of prototype theory as a theory of concepts , 1981, Cognition.

[49]  Hamid R. Rabiee,et al.  Supervised neighborhood graph construction for semi-supervised classification , 2012, Pattern Recognit..

[50]  Bernt Schiele,et al.  The Concept of Visual Classes for Object Classification , 1997 .

[51]  J. D. Smith,et al.  Prototypes in category learning: the effects of category size, category structure, and stimulus complexity. , 2001, Journal of experimental psychology. Learning, memory, and cognition.

[52]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[53]  A. Damasio Descartes' error: emotion, reason, and the human brain. avon books , 1994 .

[54]  F. Gregory Ashby,et al.  Multidimensional Models of Perception and Cognition , 2014 .

[55]  Volker Roth,et al.  Automatic Model Selection in Archetype Analysis , 2012, DAGM/OAGM Symposium.

[56]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[57]  Ameet Talwalkar,et al.  Large-scale manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Mario Fritz,et al.  Recognizing Materials from Virtual Examples , 2012, ECCV.

[60]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[61]  F. Gregory Ashby,et al.  Multidimensional models of categorization. , 1992 .

[62]  Rong Jin,et al.  Semi-Supervised Learning by Mixed Label Propagation , 2007, AAAI.

[63]  Jason Weston,et al.  Large scale manifold transduction , 2008, ICML '08.

[64]  J. Kruschke,et al.  Rules and exemplars in category learning. , 1998, Journal of experimental psychology. General.

[65]  J. Buhmann,et al.  Active learning for hierarchical pairwise data clustering , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[66]  Masashi Sugiyama,et al.  Robust Label Propagation on Multiple Networks , 2009, IEEE Transactions on Neural Networks.

[67]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[68]  Bernt Schiele,et al.  Extracting Structures in Image Collections for Object Recognition , 2010, ECCV.

[69]  Bernt Schiele,et al.  Articulated people detection and pose estimation: Reshaping the future , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[71]  Inderjit S. Dhillon,et al.  Geometry-aware metric learning , 2009, ICML '09.

[72]  Min Zhang,et al.  Spectral methods for semi-supervised manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[73]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[74]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[75]  A. Tversky,et al.  Prospect theory: analysis of decision under risk , 1979 .

[76]  Kun Deng,et al.  Balancing exploration and exploitation: a new algorithm for active machine learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[77]  G. Murphy,et al.  Category learning with minimal prior knowledge. , 2000, Journal of experimental psychology. Learning, memory, and cognition.

[78]  Edward E. Smith,et al.  Categories and concepts , 1984 .

[79]  Alexei A. Efros,et al.  Scene Semantics from Long-Term Observation of People , 2012, ECCV.

[80]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[81]  Pietro Perona,et al.  Recognition of planar object classes , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[82]  Safa R. Zaki,et al.  Prototype and exemplar accounts of category learning and attentional allocation: a reassessment. , 2003, Journal of experimental psychology. Learning, memory, and cognition.

[83]  Mark Herbster,et al.  Combining Graph Laplacians for Semi-Supervised Learning , 2005, NIPS.

[84]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[85]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[86]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[87]  Bernt Schiele,et al.  Active Metric Learning for Object Recognition , 2012, DAGM/OAGM Symposium.

[88]  Rainer Lienhart,et al.  “I can tell you what it’s not”: active learning from counterexamples , 2012, Progress in Artificial Intelligence.

[89]  Yong Jae Lee,et al.  Foreground Focus: Unsupervised Learning from Partially Matching Images , 2009, International Journal of Computer Vision.

[90]  Sandra Ebert,et al.  Semi-supervised learning for image classification , 2012 .

[91]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[92]  Douglas L. Medin,et al.  Context theory of classification learning. , 1978 .

[93]  A. Tversky,et al.  Prospect Theory : An Analysis of Decision under Risk Author ( s ) : , 2007 .

[94]  Horst Bischof,et al.  Online multi-class LPBoost , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[95]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[96]  Michael Goesele,et al.  Back to the Future: Learning Shape Models from 3D CAD Data , 2010, BMVC.

[97]  Maria-Florina Balcan,et al.  A PAC-Style Model for Learning from Labeled and Unlabeled Data , 2005, COLT.

[98]  Matthias Seeger,et al.  Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[99]  U. V. Luxburg,et al.  Getting lost in space: large sample analysis of the commute distance , 2010, NIPS 2010.

[100]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[101]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[102]  Walter G. Kropatsch,et al.  Visualization methods for neural networks , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[103]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[104]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[105]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[106]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[107]  Bernt Schiele,et al.  Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets , 2012, ACCV.

[108]  W. T. Maddox,et al.  Annals of the New York Academy of Sciences Human Category Learning 2.0 Brief Review of First-generation Research , 2022 .

[109]  Michael R. Berthold,et al.  Active learning for object classification: from exploration to exploitation , 2009, Data Mining and Knowledge Discovery.

[110]  Christoph von der Malsburg,et al.  A Neural System for the Recognition of Partially Occluded Objects in Cluttered Scenes: A Pilot Study , 1993, Int. J. Pattern Recognit. Artif. Intell..

[111]  Wei-Ying Ma,et al.  Graph based multi-modality learning , 2005, ACM Multimedia.

[112]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[113]  W. Hayward After the viewpoint debate: where next in object recognition? , 2003, Trends in Cognitive Sciences.

[114]  Masashi Sugiyama,et al.  Active Learning with Model Selection in Linear Regression , 2008, SDM.

[115]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[116]  D. Medin,et al.  The role of theories in conceptual coherence. , 1985, Psychological review.

[117]  D. Simons,et al.  Failure to detect changes to attended objects in motion pictures , 1997 .

[118]  S. Zaki,et al.  A high-distortion enhancement effect in the prototype-learning paradigm: Dramatic effects of category learning during test , 2007, Memory & cognition.

[119]  Daniel J. Simons,et al.  The Invisible Gorilla: And Other Ways Our Intuitions Deceive Us , 2010 .

[120]  Antonio Torralba,et al.  Semi-Supervised Learning in Gigantic Image Collections , 2009, NIPS.

[121]  Daniel A. Spielman,et al.  Fitting a graph to vector data , 2009, ICML '09.

[122]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[123]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[124]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[125]  James T. Kwok,et al.  Prototype vector machine for large scale semi-supervised learning , 2009, ICML '09.

[126]  Bernt Schiele,et al.  Where to look next and what to look for , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[127]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[128]  Longin Jan Latecki,et al.  Densifying Distance Spaces for Shape and Image Retrieval , 2013, Journal of Mathematical Imaging and Vision.

[129]  Benjamin Cohen,et al.  Models of Concepts , 1984, Cogn. Sci..

[130]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[131]  Paul D. Allopenna,et al.  The locus of knowledge effects in concept learning. , 1994, Journal of experimental psychology. Learning, memory, and cognition.

[132]  Immanuel Kant Kritik Der Reinen Vernunft , 2004 .

[133]  M. Posner,et al.  Perceived distance and the classification of distorted patterns. , 1967, Journal of experimental psychology.

[134]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[135]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[136]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.