A Survey of Vision-Based Architectures for Robot Learning by Imitation

Learning by imitation is a natural and intuitive way to teach social robots new behaviors. While these learning systems can use different sensory inputs, vision is often their main or even their only source of input data. However, while many vision-based robot learning by imitation (RLbI) architectures have been proposed in the last decade, they may be difficult to compare due to the absence of a common, structured description. The first contribution of this survey is the definition of a set of standard components that can be used to describe any RLbI architecture. Once these components have been defined, the second contribution of the survey is an analysis of how different vision-based architectures implement and connect them. This bottom–up, structural analysis of architectures allows to compare different solutions, highlighting their main advantages and drawbacks, from a more flexible perspective than the comparison of monolithic systems.

[1]  Cory D. Kidd,et al.  HUMANOID ROBOTS AS COOPERATIVE PARTNERS FOR PEOPLE , 2004 .

[2]  Todd Ingalls,et al.  Real-time Gesture Recognition with Minimal Training Requirements and On-line Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Éric Marchand,et al.  Using multiple hypothesis in model-based tracking , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4]  Michael A. Arbib,et al.  Perceptual Structures and Distributed Motor Control , 1981 .

[5]  Stefano Caselli,et al.  Leveraging on a virtual environment for robot programming by demonstration , 2004, Robotics Auton. Syst..

[6]  Jun Tani,et al.  Dynamic and interactive generation of object handling behaviors by a small humanoid robot using a dynamic neural network model , 2006, Neural Networks.

[7]  Yasuo Kuniyoshi,et al.  High-speed 3D object recognition using additive features in a linear subspace , 2010, 2010 IEEE International Conference on Robotics and Automation.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[10]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[11]  Kwang-Jin Choi,et al.  On-line motion retargetting , 1999, Proceedings. Seventh Pacific Conference on Computer Graphics and Applications (Cat. No.PR00293).

[12]  B. Scassellati Imitation and mechanisms of joint attention: a developmental structure for building social skills on a humanoid robot , 1999 .

[13]  Shinichi Hirai,et al.  Two-phased controller for a pair of 2-DOF soft fingertips based on the qualitative relationship between joint angles and object location , 2010, 2010 IEEE International Conference on Robotics and Automation.

[14]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[15]  Henrik I. Christensen,et al.  Autonomous Pool Cleaning: Self Localization and Autonomous Navigation for Cleaning , 2000, Auton. Robots.

[16]  M. Smyth,et al.  Space and Movement in Working Memory , 1990, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[17]  Marc Downie,et al.  Animation and Music: The Music and Movement of Synthetic Characters , 2001 .

[18]  Aaron Hertzmann,et al.  Style machines , 2000, SIGGRAPH 2000.

[19]  Cynthia Breazeal,et al.  Learning From and About Others: Towards Using Imitation to Bootstrap the Social Understanding of Others by Robots , 2005, Artificial Life.

[20]  J. Hodgins,et al.  Optimizing Human Motion for the Control of a Humanoid Robot , 2002 .

[21]  N. Miller,et al.  Social Learning and Imitation , 1942 .

[22]  Harini Veeraraghavan,et al.  Teaching Sequential Tasks with Repetition through Demonstration (Short Paper) , 2008 .

[23]  Sylvain Calinon,et al.  Continuous extraction of task constraints in a robot programming by demonstration framework , 2007 .

[24]  Yoshihiko Nakamura,et al.  Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..

[25]  Ignazio Infantino,et al.  A cognitive framework for imitation learning , 2006, Robotics Auton. Syst..

[26]  Chrystopher L. Nehaniv,et al.  Correspondence Mapping Induced State and Action Metrics for Robotic Imitation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Chrystopher L. Nehaniv,et al.  Imitation as a Dual-Route Process Featuring Predictive and Learning Components: A Biologically Plausible Computational Model , 2002 .

[28]  Ming Li,et al.  3D Positioning for Mobile Robot Using Omnidirectional Vision , 2010, 2010 International Conference on Intelligent Computation Technology and Automation.

[29]  Ales Ude,et al.  Programming full-body movements for humanoid robots by observation , 2004, Robotics Auton. Syst..

[30]  G. Teti,et al.  Bio-inspired control of eye-head coordination in a robotic anthropomorphic head , 2006, The First IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, 2006. BioRob 2006..

[31]  Francisco Sandoval Hernández,et al.  Fast gesture recognition based on a two-level representation , 2009, Pattern Recognit. Lett..

[32]  Masayuki Inaba,et al.  From visuo-motor self learning to early imitation-a neural architecture for humanoid learning , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[33]  José Santos-Victor,et al.  Visual learning by imitation with motor representations , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[35]  Dana Kulic,et al.  Whole body motion primitive segmentation from monocular video , 2009, 2009 IEEE International Conference on Robotics and Automation.

[36]  A. Meltzoff,et al.  Explaining Facial Imitation: A Theoretical Model. , 1997, Early development & parenting.

[37]  Yangsheng Xu,et al.  Human action learning via hidden Markov model , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[38]  Drew H. Abney,et al.  Journal of Experimental Psychology : Human Perception and Performance Influence of Musical Groove on Postural Sway , 2015 .

[39]  Tingting Xu,et al.  The Autonomous City Explorer: Towards Natural Human-Robot Interaction in Urban Environments , 2009, Int. J. Soc. Robotics.

[40]  Cynthia Breazeal,et al.  Designing sociable robots , 2002 .

[41]  Atsuto Maki,et al.  Attentional Scene Segmentation: Integrating Depth and Motion , 2000, Comput. Vis. Image Underst..

[42]  Cynthia Breazeal,et al.  Interactive robot theatre , 2003, CACM.

[43]  Laurent Itti,et al.  Real-time high-performance attention focusing in outdoors color video streams , 2002, IS&T/SPIE Electronic Imaging.

[44]  W. Thorpe Learning and instinct in animals , 1956 .

[45]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[46]  Francisco Sandoval Hernández,et al.  A Novel Hierarchical Framework for Object-Based Visual Attention , 2008, WAPCV.

[47]  C. Eriksen,et al.  Allocation of attention in the visual field. , 1985, Journal of experimental psychology. Human perception and performance.

[48]  Jochen J. Steil,et al.  Task-level imitation learning using variance-based movement optimization , 2009, 2009 IEEE International Conference on Robotics and Automation.

[49]  Masayuki Inaba,et al.  Gesture Recognition for Humanoids using Proto-symbol Space , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[50]  M. Donald Origins of the modern mind , 1991 .

[51]  VectorRegressionAlex J. Smola A Tutorial on Support Vector Regression Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150 , 1998 .

[52]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[53]  Christopher G. Atke Combining peripheral and foveal humanoid vision to detect, pursue, recognize and act , 2003 .

[54]  Michael Gleicher,et al.  Retargetting motion to new characters , 1998, SIGGRAPH.

[55]  Lukás Sekanina,et al.  Evolutionary Approach to Improve Wavelet Transforms for Image Compression in Embedded Systems , 2011, EURASIP J. Adv. Signal Process..

[56]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[57]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[58]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Bärbel Mertsching,et al.  Color Saliency and Inhibition Using Static and Dynamic Scenes in Region Based Visual Attention , 2008, WAPCV.

[60]  G. Sandini,et al.  Babybot : an artificial developing robotic agent , 2000 .

[61]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[62]  M. S. Konstantinov 4th International symposium on industrial robots , 1975 .

[63]  Alexandre Bernardino,et al.  Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub , 2008, 2008 IEEE International Conference on Robotics and Automation.

[64]  T. Zentall,et al.  Social learning : psychological and biological perspectives , 1988 .

[65]  Fakhri Karray,et al.  Object- and space-based visual attention: An integrated framework for autonomous robots , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[66]  Shaoning Pang,et al.  Incremental Learning of Chunk Data for Online Pattern Classification Systems , 2008, IEEE Transactions on Neural Networks.

[67]  Bandera Rubio,et al.  Vision-based gesture recognition in a robot learning by imitation framework , 2011 .

[68]  Sung Yong Shin,et al.  Computer puppetry: An importance-based approach , 2001, TOGS.

[69]  Rüdiger Dillmann,et al.  Markerless human motion tracking with a flexible model and appearance learning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[70]  Anthony Stefanidis,et al.  3D trajectory matching by pose normalization , 2005, GIS '05.

[71]  Francisco Sandoval Hernández,et al.  Robot learning of upper-body human motion by active imitation , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[72]  José Santos-Victor,et al.  A Developmental Roadmap for Learning by Imitation in Robots , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[73]  Miquel Sànchez-Marrè,et al.  A purely reactive navigation scheme for dynamic environments using Case-Based Reasoning , 2006, Auton. Robots.

[74]  John K. Tsotsos,et al.  Attention in Cognitive Systems, 5th International Workshop on Attention in Cognitive Systems, WAPCV 2008, Fira, Santorini, Greece, May 12, 2008, Revised Selected Papers , 2009, WAPCV.

[75]  Pietro Perona,et al.  Hybrid models for human motion recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[76]  Aude Billard,et al.  Stochastic gesture production and recognition model for a humanoid robot , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[77]  Gordon Cheng,et al.  Learning to Act from Observation and Practice , 2004, Int. J. Humanoid Robotics.

[78]  C.L. Nehaniv,et al.  From Unknown Sensors and Actuators to Visually Guided Movement , 2005, Proceedings. The 4nd International Conference on Development and Learning, 2005..

[79]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[80]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[81]  Jaime Pulido Fentanes,et al.  Motivation and competitive learning in a social robot , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[82]  Gordon Cheng,et al.  Combining peripheral and foveal humanoid vision to detect, pursue, recognize and act , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[83]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[84]  Francisco Sandoval Hernández,et al.  Real-time template-based tracking of non-rigid objects using bounded irregular pyramids , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[85]  Cristina Urdiales,et al.  Vision system based on shifted fovea multiresolution retinotopologies , 1998, IECON '98. Proceedings of the 24th Annual Conference of the IEEE Industrial Electronics Society (Cat. No.98CH36200).

[86]  Gerd Hirzinger,et al.  Torque and workspace analysis for flexible tendon driven mechanisms , 2010, 2010 IEEE International Conference on Robotics and Automation.

[87]  A. Meltzoff,et al.  Imitation in Newborn Infants: Exploring the Range of Gestures Imitated and the Underlying Mechanisms. , 1989, Developmental psychology.

[88]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[89]  Edoardo Ardizzone,et al.  Pose classification using support vector machines , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[90]  Marc Arsenault Optimization of the prestress stable wrench closure workspace of planar parallel three-degree-of-freedom cable-driven mechanisms with four cables , 2010, 2010 IEEE International Conference on Robotics and Automation.

[91]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[92]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[93]  N. Miller,et al.  Social Learning and Imitation , 1942 .

[94]  Masaki Ogino,et al.  Interaction rule learning with a human partner based on an imitation faculty with a simple visuo-motor mapping , 2006, Robotics Auton. Syst..

[95]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[96]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[97]  Abraham Kandel,et al.  3-Dimensional curve similarity using string matching , 2004, Robotics Auton. Syst..

[98]  J.A. Rodriguez,et al.  A novel hybrid approach to upper-body human motion capture , 2008, MELECON 2008 - The 14th IEEE Mediterranean Electrotechnical Conference.

[99]  A. Goldman,et al.  Mirror neurons and the simulation theory of mind-reading , 1998, Trends in Cognitive Sciences.

[100]  Min Tan,et al.  Topological localisation based on monocular vision and unsupervised learning , 2010 .

[101]  Stefano Caselli,et al.  Robust trajectory learning and approximation for robot programming by demonstration , 2006, Robotics Auton. Syst..

[102]  Naif Alajlan,et al.  Shape retrieval using triangle-area representation and dynamic space warping , 2007, Pattern Recognit..

[103]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[104]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[105]  G. Rizzolatti,et al.  Premotor cortex and the recognition of motor actions. , 1996, Brain research. Cognitive brain research.

[106]  Giulio Sandini,et al.  A Proto-object Based Visual Attention Model , 2008, WAPCV.

[107]  Stefan Schaal,et al.  Computational approaches to motor learning by imitation. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[108]  Jaime Pulido Fentanes,et al.  Robot Learning in a Social Robot , 2006, SAB.

[109]  Rüdiger Dillmann,et al.  Teaching and learning of robot tasks via observation of human performance , 2004, Robotics Auton. Syst..

[110]  Emanuele Menegatti,et al.  Image-based Monte Carlo localisation with omnidirectional images , 2004, Robotics Auton. Syst..

[111]  Shaoning Pang,et al.  Incremental linear discriminant analysis for classification of data streams , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[112]  Christopher W. Geib,et al.  The meaning of action: a review on action recognition and mapping , 2007, Adv. Robotics.

[113]  K. Dautenhahn,et al.  The correspondence problem , 2002 .

[114]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[115]  Katsushi Ikeuchi,et al.  Modeling manipulation interactions by hidden Markov models , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[116]  Robert B. Fisher,et al.  Object-based visual attention for computer vision , 2003, Artif. Intell..

[117]  Kwang-Jin Choi,et al.  Online motion retargetting , 2000, Comput. Animat. Virtual Worlds.

[118]  Van Anh Ho,et al.  Design of a small-scale tactile sensor with three sensing points for using in robotic fingertips , 2010, 2010 IEEE International Conference on Robotics and Automation.

[119]  R. Bellman Dynamic programming. , 1957, Science.

[120]  Paul E. Utgoff,et al.  On integrating apprentice learning and reinforcement learning , 1996 .

[121]  Toyoaki Nishida,et al.  Interactive perception for amplification of intended behavior in complex noisy environments , 2008, AI & SOCIETY.

[122]  Antonio Bandera,et al.  A Novel Biologically Inspired Attention Mechanism for a Social Robot , 2011, EURASIP J. Adv. Signal Process..

[123]  Jodi Forlizzi,et al.  Service robots in the domestic environment: a study of the roomba vacuum in the home , 2006, HRI '06.

[124]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[125]  Tamim Asfour,et al.  Imitation Learning of Dual-Arm Manipulation Tasks in Humanoid Robots , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[126]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[127]  T. Asfour,et al.  ARMAR-III : A HUMANOID PLATFORM FOR PERCEPTION-ACTION INTEGRATION , 2006 .

[128]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[129]  Aiguo Ming,et al.  High sensitivity initial slip sensor for dexterous grasp , 2010, 2010 IEEE International Conference on Robotics and Automation.

[130]  M. Stolle,et al.  Knowledge Transfer Using Local Features , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[131]  J. Galef • IMITATION IN ANIMALS: HISTORY, DEFINITION, AND INTERPRETATION OF DATA FROM THE PSYCHOLOGICAL LABORATORY , 2013 .

[132]  Ales Ude,et al.  Stereo-based Markerless Human Motion Capture for Humanoid Robot Systems , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[133]  Darren Newtson,et al.  The objective basis of behavior units. , 1977 .

[134]  Francisco Sandoval Hernández,et al.  Data-and Model-driven Attention Mechanism for Autonomous Visual Landmark Acquisition , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[135]  Aude Billard,et al.  Bringing up robots or—the psychology of socially intelligent robots: from theory to implementation , 1999, AGENTS '99.

[136]  Aude Billard,et al.  Social mechanisms of robot programming by demonstration , 2006, Robotics Auton. Syst..