From passive to interactive object learning and recognition through self-identification on a humanoid robot

Service robots, working in evolving human environments, need the ability to continuously learn to recognize new objects. Ideally, they should act as humans do, by observing their environment and interacting with objects, without specific supervision. Taking inspiration from infant development, we propose a developmental approach that enables a robot to progressively learn objects appearances in a social environment: first, only through observation, then through active object manipulation. We focus on incremental, continuous, and unsupervised learning that does not require prior knowledge about the environment or the robot. In the first phase, we analyse the visual space and detect proto-objects as units of attention that are learned and recognized as possible physical entities. The appearance of each entity is represented as a multi-view model based on complementary visual features. In the second phase, entities are classified into three categories: parts of the body of the robot, parts of a human partner, and manipulable objects. The categorization approach is based on mutual information between the visual and proprioceptive data, and on motion behaviour of entities. The ability to categorize entities is then used during interactive object exploration to improve the previously acquired objects models. The proposed system is implemented and evaluated with an iCub and a Meka robot learning 20 objects. The system is able to recognize objects with 88.5 % success and create coherent representation models that are further improved by interactive learning.

[1]  C. Kemp,et al.  What Can I Control ? : The Development of Visual Categories for a Robot ’ s Body and the World that it Influences , 2006 .

[2]  Giulio Sandini,et al.  The iCub Platform: A Tool for Studying Intrinsically Motivated Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[3]  Alexander Stoytchev,et al.  Using sequences of movement dependency graphs to form object categories , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[4]  H. Bastian Sensation and Perception.—I , 1869, Nature.

[5]  Dieter Fox,et al.  Manipulator and object tracking for in-hand 3D object modeling , 2011, Int. J. Robotics Res..

[6]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[7]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[8]  Katherine D. Kinzler,et al.  Core knowledge. , 2007, Developmental science.

[9]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[10]  M. Tarr,et al.  Visual Object Recognition , 1996, ISTCS.

[11]  Danica Kragic,et al.  Birth of the Object: Detection of Objectness and Extraction of Object Shape through Object-Action complexes , 2008, Int. J. Humanoid Robotics.

[12]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2016, Texts in Computer Science.

[13]  A. Needham,et al.  A pick-me-up for infants’ exploratory skills: Early simulated experiences reaching for objects using ‘sticky mittens’ enhances young infants’ object exploration skills , 2002 .

[14]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[15]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[16]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[17]  Gordon Cheng,et al.  Making Object Learning and Recognition an Active Process , 2008, Int. J. Humanoid Robotics.

[18]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[20]  Pierre-Yves Oudeyer,et al.  Learning to recognize objects through curiosity-driven manipulation with the iCub humanoid robot , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[21]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Arnold W. M. Smeulders,et al.  Color-based object recognition , 1997, Pattern Recognit..

[23]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[24]  Changle Zhou,et al.  An Infant Development-inspired Approach to Robot Hand-eye Coordination , 2014 .

[25]  Wolfgang Förstner,et al.  Coding Images with Local Features , 2010, International Journal of Computer Vision.

[26]  Edward R. Dougherty,et al.  Mathematical Morphology in Image Processing , 1992 .

[27]  Mark Fiala,et al.  ARTag, a fiducial marker system using digital techniques , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Brian Scassellati,et al.  Learning acceptable windows of contingency , 2006, Connect. Sci..

[29]  Lorenzo Natale,et al.  Tapping into Touch , 2005 .

[30]  David Filliat,et al.  A visual bag of words method for interactive qualitative localization and mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[31]  Connor Schenck,et al.  Interactive object recognition using proprioceptive and auditory feedback , 2011, Int. J. Robotics Res..

[32]  Giulio Sandini,et al.  From sensorimotor development to object perception , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[33]  Ronald A. Rensink Seeing, sensing, and scrutinizing , 2000, Vision Research.

[34]  Jun Morimoto,et al.  Integrating visual perception and manipulation for autonomous learning of object representations , 2013, Adapt. Behav..

[35]  Giulio Sandini,et al.  Body Definition Based on Visuomotor Correlation , 2012, IEEE Transactions on Industrial Electronics.

[36]  Natalia Lyubova,et al.  Developmental approach of perception for a humanoid robot. (Approche développementale de la perception pour un robot humanoïde) , 2013 .

[37]  Benjamin Kuipers,et al.  The initial development of object knowledge by a learning robot , 2008, Robotics Auton. Syst..

[38]  Heiko Wersing,et al.  Online Learning of Objects in a Biologically Motivated Visual Architecture , 2007, Int. J. Neural Syst..

[39]  Jana Kosecka,et al.  Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[40]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[41]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[42]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[43]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[44]  Trevor Darrell,et al.  Using robotic exploratory procedures to learn the meaning of haptic adjectives , 2013, 2013 IEEE International Conference on Robotics and Automation.

[45]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[46]  Patricia Shaw,et al.  From Saccades to Grasping: A Model of Coordinated Reaching Through Simulated Development on a Humanoid Robot , 2014, IEEE Transactions on Autonomous Mental Development.

[47]  T. Southey,et al.  Object Discovery through Motion, Appearance and Shape , 2006 .

[48]  David G. Stork,et al.  Pattern Classification , 1973 .

[49]  Giulio Sandini,et al.  A Proto-object Based Visual Attention Model , 2008, WAPCV.

[50]  M. Goodale,et al.  Active manual control of object views facilitates visual recognition , 1999, Current Biology.

[51]  Serge Beucher,et al.  The Morphological Approach to Segmentation: The Watershed Transformation , 2018, Mathematical Morphology in Image Processing.

[52]  Giorgio Metta,et al.  Better Vision through Manipulation , 2003, Adapt. Behav..

[53]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Gaurav S. Sukhatme,et al.  Using manipulation primitives for brick sorting in clutter , 2012, 2012 IEEE International Conference on Robotics and Automation.

[55]  Frank Y. Shih,et al.  Image Processing and Mathematical Morphology: Fundamentals and Applications , 2017 .

[56]  Alejandro Hernández Arieta,et al.  Body Schema in Robotics: A Review , 2010, IEEE Transactions on Autonomous Mental Development.

[57]  Marco Antonelli,et al.  Implicit Sensorimotor Mapping of the Peripersonal Space by Gazing and Reaching , 2011, IEEE Transactions on Autonomous Mental Development.

[58]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Lucas Paletta,et al.  Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint , 2008, Lecture Notes in Computer Science.

[61]  Mark Lee,et al.  Robotic hand-eye coordination without global reference: A biologically inspired learning scheme , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[62]  Elizabeth S. Spelke,et al.  Principles of Object Perception , 1990, Cogn. Sci..

[63]  J. Piaget Play, dreams and imitation in childhood , 1951 .

[64]  Giorgio Metta,et al.  Active object recognition on a humanoid robot , 2012, 2012 IEEE International Conference on Robotics and Automation.

[65]  Narendra Ahuja,et al.  Gaussian mixture model for human skin color and its applications in image and video databases , 1998, Electronic Imaging.

[66]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[67]  Oliver Kroemer,et al.  Probabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments , 2014, IEEE Transactions on Robotics.

[68]  Ramón López de Mántaras,et al.  Real-Time Object Segmentation Using a Bag of Features Approach , 2010, CCIA.

[69]  Pierre-Yves Oudeyer,et al.  Object Learning Through Active Exploration , 2014, IEEE Transactions on Autonomous Mental Development.

[70]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[71]  Z. Pylyshyn Visual indexes, preconceptual objects, and situated vision , 2001, Cognition.

[72]  Brian Scassellati,et al.  Motion-based robotic self-recognition , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[73]  Mohamed Chetouani,et al.  Perception and human interaction for developmental learning of objects and affordances , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[74]  James L. McClelland,et al.  Autonomous Mental Development by Robots and Animals , 2001, Science.

[75]  Jitendra Malik,et al.  Color- and texture-based image segmentation using EM and its application to content-based image retrieval , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[76]  Thomas S. Huang,et al.  A fast two-dimensional median filtering algorithm , 1979 .

[77]  Brian Scassellati,et al.  Self-Taught Visually-Guided Pointing for a Humanoid Robot , 2006 .

[78]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..