Concurrent learning of visual codebooks and object categories in open-ended domains

In open-ended domains, robots must continuously learn new object categories. When the training sets are created offline, it is not possible to ensure their representativeness with respect to the object categories and features the system will find when operating online. In the Bag of Words model, visual codebooks are usually constructed from training sets created offline. This might lead to non-discriminative visual words and, as a consequence, to poor recognition performance. This paper proposes a visual object recognition system which concurrently learns in an incremental and online fashion both the visual object category representations as well as the codebook words used to encode them. The codebook is defined using Gaussian Mixture Models which are updated using new object views. The approach contains similarities with the human visual object recognition system: evidence suggests that the development of recognition capabilities occurs on multiple levels and is sustained over large periods of time. Results show that the proposed system with concurrent learning of object categories and codebooks is capable of learning more categories, requiring less examples, and with similar accuracies, when compared to the classical Bag of Words approach using codebooks constructed offline.

[1]  GeversTheo,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010 .

[2]  Florent Perronnin,et al.  Universal and Adapted Vocabularies for Generic Visual Categorization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Miao-Jun Hu,et al.  The Comparison of Classifiers for Object Categorization Based on Bag-of-Word Technology , 2009, 2009 Chinese Conference on Pattern Recognition.

[5]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[6]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[7]  Cristina Montomoli,et al.  The Development of Visual Object Recognition in School-Age Children , 2007, Developmental neuropsychology.

[8]  Marc Sebban,et al.  Supervised learning of Gaussian mixture models for visual vocabulary generation , 2012, Pattern Recognit..

[9]  Marlene Behrmann,et al.  Development of object recognition in humans , 2009, F1000 biology reports.

[10]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[11]  Justus H. Piater,et al.  Online Learning of Gaussian Mixture Models - a Two-Level Approach , 2008, VISAPP.

[12]  Rafael García,et al.  Automatic Visual Bag-of-Words for Online Robot Navigation and Mapping , 2012, IEEE Transactions on Robotics.

[13]  Qi Tian,et al.  Category sensitive codebook construction for object category recognition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[14]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[15]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Frédéric Jurie,et al.  Latent mixture vocabularies for object categorization and segmentation , 2009, Image Vis. Comput..

[17]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Horst-Michael Groß,et al.  A life-long learning vector quantization approach for interactive learning of multiple categories , 2012, Neural Networks.

[19]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[20]  Gi Hyun Lim,et al.  A perceptual memory system for grounding semantic representations in intelligent service robots , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  L. Seabra Lopes,et al.  How many words can my robot learn?: An approach and experiments with one-class learning , 2007 .

[22]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[25]  Luís Seabra Lopes,et al.  Using spoken words to guide open-ended category formation , 2011, Cognitive Processing.

[26]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[27]  Kjersti Engan,et al.  Recursive Least Squares Dictionary Learning Algorithm , 2010, IEEE Transactions on Signal Processing.

[28]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[29]  Rong Jin,et al.  Discriminative Cluster Refinement: Improving Object Category Recognition Given Limited Training Data , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Martin Jüttner,et al.  A developmental dissociation of view-dependent and view-invariant object recognition in adolescence , 2006, Behavioural Brain Research.

[31]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).