Adaptive scene dependent filters for segmentation and online learning of visual objects

We propose the adaptive scene dependent filter (ASDF) hierarchy for unsupervised learning of image segmentation, which integrates several processing pathways into a flexible, highly dynamic, and real-time capable vision architecture. It is based on forming a combined feature space from basic feature maps like, color, disparity, and pixel position. To guarantee real-time performance, we apply an enhanced vector quantization method to partition this feature space. The learned codebook defines corresponding best-match segments for each prototype and yields an over-segmentation of the object and the surround. The segments are recombined into a final object segmentation mask based on a relevance map, which encodes a coarse bottom-up hypothesis where the object is located in the image. We apply the ASDF hierarchy for preprocessing input images in a feature-based biologically motivated object recognition learning architecture and show experiments with this real-time vision system running at 6Hz including the online learning of the segmentation. Because interaction with user is not perfect, the real-world system acquires useful views effectively only at about 1.5Hz, but we show that for training a new object one hundred views taking only one minute of interaction time is sufficient.

[1]  H. Ritter,et al.  Interactive online learning , 2007, Pattern Recognition and Image Analysis.

[2]  Stan Sclaroff,et al.  Skin color-based video segmentation under time-varying illumination , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Heiko Wersing,et al.  A Biologically Motivated System for Unconstrained Online Learning of Visual Objects , 2006, ICANN.

[4]  M. Bravo,et al.  Object segmentation by top-down processes , 2003 .

[5]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[6]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Jung Kim,et al.  Image compression using fast transformed vector quantization , 2000, Proceedings 29th Applied Imagery Pattern Recognition Workshop.

[9]  Helge J. Ritter,et al.  Neural Architectures for Robot Intelligence , 2003, Reviews in the neurosciences.

[10]  Ming Xie,et al.  Color clustering and learning for image segmentation based on neural networks , 2005, IEEE Trans. Neural Networks.

[11]  Helge J. Ritter,et al.  Guiding attention for grasping tasks by gestural instruction: the GRAVIS-robot architecture , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[12]  Ronen Basri,et al.  Texture segmentation by multiscale aggregation of filter responses and shape elements , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Sebastian Lang,et al.  Improving adaptive skin color segmentation by incorporating results from face detection , 2002, Proceedings. 11th IEEE International Workshop on Robot and Human Interactive Communication.

[14]  Helge J. Ritter,et al.  Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.

[15]  Heiko Wersing,et al.  Rapid Online Learning of Objects in a Biologically Motivated Recognition Architecture , 2005, DAGM-Symposium.

[16]  N. H. Kim,et al.  Segmentation of object regions using depth information , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[17]  Christopher K. I. Williams,et al.  Greedy Learning of Multiple Objects in Images Using Robust Statistics and Factorial Learning , 2004, Neural Computation.

[18]  Brian Scassellati,et al.  A Context-Dependent Attention System for a Social Robot , 1999, IJCAI.

[19]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[20]  Balaji Krishnapuram,et al.  Generative models and Bayesian model comparison for shape recognition , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[21]  Ronen Basri,et al.  Segmentation and boundary detection using multiscale intensity measurements , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  Joseph A. Driscoll,et al.  A visual attention network for a humanoid robot , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[23]  Gunther Heidemann A Multi-purpose Visual Classification System , 2001, Fuzzy Days.

[24]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Zhi-Hua Zhou,et al.  SOM Ensemble-Based Image Segmentation , 2004, Neural Processing Letters.

[26]  Helge J. Ritter,et al.  Efficient Vector Quantization Using the WTA-Rule with Activity Equalization , 2004, Neural Processing Letters.

[27]  Ralph Gross,et al.  Concurrent Object Recognition and Segmentation by Graph Partitioning , 2002, NIPS.

[28]  Helge J. Ritter,et al.  An instantaneous topological mapping model for correlated stimuli , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[29]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[30]  Heiko Wersing,et al.  Peripersonal space and object recognition for humanoids , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[31]  Hai Tao,et al.  Global matching criterion and color segmentation based stereo , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[32]  Dorin Comaniciu,et al.  Image coding using transform vector quantization with training set synthesis , 2002, Signal Process..

[33]  Helge Ritter,et al.  Combining multiple neural nets for visual feature selection and classification , 1999 .