Developmentally deep perceptual system for a humanoid robot

This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply ‘pokes around’ in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Shaogang Gong,et al.  Fusion of perceptual cues for robust tracking of head pose and position , 2001, Pattern Recognit..

[2]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[3]  A. Oliva,et al.  Segmentation of objects from backgrounds in visual search tasks , 2002, Vision Research.

[4]  G. Rizzolatti,et al.  Coding of peripersonal space in inferior premotor cortex (area F4). , 1996, Journal of neurophysiology.

[5]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000 .

[6]  Joanna McGrenere,et al.  Affordances: Clarifying and Evolving a Concep , 2000, Graphics Interface.

[7]  David Jacobs,et al.  Fragment Completion in Humans and Machines , 2001, NIPS.

[8]  Andrea Selinger Analysis and Applications of Feature-Based Object Recognition , 2001 .

[9]  P. Bloom How Children Learn the Meaning of Words and How LSA Does It ( Too ) , 2005 .

[10]  Alexander Zelinsky,et al.  Robust Real-Time Face Tracking and Gesture Recognition , 1997, IJCAI.

[11]  C. Trevarthen Communication and cooperation in early infancy: a description of primary intersubjectivity , 1979 .

[12]  Timothy J. Hazen,et al.  A comparison and combination of methods for OOV word detection and word confidence scoring , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  James R. Glass,et al.  Speechbuilder: facilitating spoken dialogue system development , 2001, INTERSPEECH.

[14]  Maja J. Matarić,et al.  Augmented Markov Models , 1999 .

[15]  Michael A. Arbib,et al.  Schema design and implementation of the grasp-related mirror neuron system , 2002, Biological Cybernetics.

[16]  Cynthia Breazeal,et al.  Characterizing and Processing Robot-Directed Speech , 2001 .

[17]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[18]  J. Demiris,et al.  Human-robot-communication and Machine Learning Abbr. Title: Human-robot-communication and Ml , 1997 .

[19]  Joanna McGrenere,et al.  Affordances: Clarifying and Evolving a Concep , 2000, Graphics Interface.

[20]  Herbert Gish,et al.  Phonetic-based word spotter: various configurations and application to event spotting , 1993, EUROSPEECH.

[21]  M. Bullowa Before Speech: The Beginning of Interpersonal Communication , 1979 .

[22]  R. Bajcsy Active perception , 1988 .

[23]  Brian Scassellati,et al.  Active vision for sociable robots , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[24]  Matthew M. Williamson,et al.  Neural control of rhythmic arm movements , 1998, Neural Networks.

[25]  J. J. Gibson The theory of affordances , 1977 .

[26]  Toshiro Kubota Orientational filters for real-time computer vision problems , 1996 .

[27]  Mark Moll,et al.  Reconstructing shape from motion using tactile sensors , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[28]  Matthew M. Williamson,et al.  Exploiting Natural Dynamics in Robot Control , 1998 .

[29]  R. Held,et al.  MOVEMENT-PRODUCED STIMULATION IN THE DEVELOPMENT OF VISUALLY GUIDED BEHAVIOR. , 1963, Journal of comparative and physiological psychology.

[30]  P. Jusczyk,et al.  Infants′ Detection of the Sound Patterns of Words in Fluent Speech , 1995, Cognitive Psychology.

[31]  Jonathan H. Connell,et al.  A colony architecture for an artificial creature , 1989 .

[32]  Hauke Schramm,et al.  The thoughtful elephant: strategies for spoken dialog systems , 2000, IEEE Trans. Speech Audio Process..

[33]  Toshiro Kubota,et al.  Computation of Orientational Filters for Real-Time Computer Vision Problems I: Implementation and Methodology , 1995, Real Time Imaging.

[34]  Stan Sclaroff,et al.  Estimation and prediction of evolving color distributions for skin segmentation under varying illumination , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[35]  H. Sakata,et al.  Selectivity for the shape, size, and orientation of objects for grasping in neurons of monkey parietal area AIP. , 2000, Journal of neurophysiology.

[36]  Richard Cole,et al.  Shape from Probing , 1987, J. Algorithms.

[37]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Paul R. Cohen,et al.  Discovering Rules for Clustering and Predicting Asynchronous Events , 1998 .

[39]  Victor Zue,et al.  Sub-lexical modelling using a finite state transducer framework , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[40]  P. Hawkes Moving pictures , 1982, Nature.

[41]  Matthew Marjanovic,et al.  Teaching and old robot new tricks: learning novel tasks via interaction with people and things , 2003 .

[42]  H. Nothdurft The role of features in preattentive vision: Comparison of orientation, motion and color cues , 1993, Vision Research.

[43]  A. Needham Object recognition and object segregation in 4.5-month-old infants. , 2001, Journal of experimental child psychology.

[44]  Xuemei Wang Planning While Learning Operators , 1996, AIPS.

[45]  Aude Billard,et al.  Imitation: a Means to Enhance Learning of a Synthetic Proto-language in an Autonomous Robot , 1999 .

[46]  N. Goodman Fact, Fiction, and Forecast , 1955 .

[47]  G. Butterworth The ontogeny and phylogeny of joint visual attention. , 1991 .

[48]  G. Lakoff,et al.  Women, Fire, and Dangerous Things: What Categories Reveal about the Mind , 1988 .

[49]  Michael S. Brandstein,et al.  A hybrid real-time face tracking system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[50]  Gary C. Borchardt,et al.  Causal Reconstruction , 1993 .

[51]  Giorgio Metta,et al.  Towards manipulation-driven vision , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[52]  Alan C. Bovik,et al.  FOVEA: a foveated vergent active stereo system for dynamic three-dimensional scene recovery , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[53]  Scott P. Johnson,et al.  Visual statistical learning in infancy: evidence for a domain general learning mechanism , 2002, Cognition.

[54]  Sebastian Thrun,et al.  A Framework for Programming Embedded Systems: Initial Design and Results , 1998 .

[55]  Andrew W. Fitzgibbon,et al.  Ellipse-specific direct least-square fitting , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[56]  Deb Roy,et al.  Conversational Robots: Building Blocks for Grounding Word Meaning , 2003, HLT-NAACL 2003.

[57]  Patrick Gros,et al.  Rapid Object Indexing and Recognition Using Enhanced Geometric Hashing , 1996, ECCV.

[58]  Jitendra Malik,et al.  Learning to Detect Natural Image Boundaries Using Brightness and Texture , 2002, NIPS.

[59]  B. Scassellati Imitation and mechanisms of joint attention: a developmental structure for building social skills on a humanoid robot , 1999 .

[60]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[61]  M. Jeannerod The cognitive neuroscience of action , 1997, Trends in Cognitive Sciences.

[62]  Brian Scassellati,et al.  Theory of Mind for a Humanoid Robot , 2002, Auton. Robots.

[63]  Alan L. Yuille,et al.  Statistical Edge Detection: Learning and Evaluating Edge Cues , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Michael J. Black,et al.  A computational and evolutionary perspective on the role of representation in vision , 1994 .

[65]  Denis Burnham,et al.  Are you my little pussy-cat? acoustic, phonetic and affective qualities of infant- and pet-directed speech , 1998, ICSLP.

[66]  Marion A. Eppler,et al.  Development of perception of affordances. , 1993 .

[67]  E. Bard,et al.  The unintelligibility of speech to children: effects of referent availability , 1994, Journal of Child Language.

[68]  Ying Wu,et al.  Wide-range, person- and illumination-insensitive head orientation estimation , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[69]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[70]  R. Baillargeon,et al.  Object segregation in 8-month-old infants , 1997, Cognition.

[71]  Emil M. Petriu,et al.  Real-time 2(1/2)-D head pose recovery for model-based video-coding , 2001, IEEE Trans. Instrum. Meas..

[72]  S. Ullman Visual routines , 1984, Cognition.

[73]  Dimitris N. Metaxas,et al.  The integration of optical flow and deformable models with applications to human face shape and motion estimation , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[74]  M. Minsky The Society of Mind , 1986 .

[75]  Yoshinobu Sato,et al.  Orientation Space Filtering for Multiple Orientation Line Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  J. Mazziotta,et al.  Cortical mechanisms of human imitation. , 1999, Science.

[77]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[78]  Cynthia Breazeal,et al.  Recognition of Affective Communicative Intent in Robot-Directed Speech , 2002, Auton. Robots.

[79]  R. Brooks,et al.  The cog project: building a humanoid robot , 1999 .

[80]  Shaogang Gong,et al.  Real-time face pose estimation , 1998, Real Time Imaging.

[81]  Kenneth Webb,et al.  Evolution of Communication Simulation of Adaptive Behavior – Project Report , 2004 .

[82]  Brian Scassellati,et al.  Infant-like Social Interactions between a Robot and a Human Caregiver , 2000, Adapt. Behav..

[83]  S. Leeds,et al.  Perception and Cognition: Issues in the Foundations of Psychology , 1978 .

[84]  Yiannis Aloimonos,et al.  Active vision , 2004, International Journal of Computer Vision.

[85]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[86]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[87]  James R. Glass,et al.  Modeling out-of-vocabulary words for robust speech recognition , 2000, INTERSPEECH.

[88]  David W. Jacobs Robust and efficient detection of convex groups , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[89]  N.D. Georganas,et al.  Real-time 2 1/2 D head pose recovery for model-based video-coding , 2000, Proceedings of the 17th IEEE Instrumentation and Measurement Technology Conference [Cat. No. 00CH37066].

[90]  Larry S. Davis,et al.  An anthropometric shape model for estimating head orientation , 1997 .

[91]  Tim Oates,et al.  Identifying distinctive subsequences in multivariate time series by clustering , 1999, KDD '99.

[92]  Jean-Claude Latombe,et al.  Planning motions with intentions , 1994, SIGGRAPH.

[93]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[94]  Paul A. Beardsley A qualitative approach to classifying head and eye pose , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[95]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[96]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[97]  Pravin M. Vaidya Geometry helps in matching , 1988, STOC '88.

[98]  Jeff B. Pelz,et al.  Visual Representations in a Natural Visuo-motor Task , 1995 .

[99]  Dana H. Ballard,et al.  Behavioural constraints on animate vision , 1989, Image Vis. Comput..

[100]  Brian Scassellati,et al.  A Context-Dependent Attention System for a Social Robot , 1999, IJCAI.

[101]  W. Eric L. Grimson,et al.  On the Sensitivity of the Hough Transform for Object Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[102]  Alex Pentland,et al.  Motion regularization for model-based head tracking , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[103]  Julia Hirschberg,et al.  Prosodic cues to recognition errors , 1999 .

[104]  Manfred H. Hueckel An Operator Which Locates Edges in Digitized Pictures , 1971, J. ACM.

[105]  James L. McClelland,et al.  Autonomous Mental Development by Robots and Animals , 2001, Science.

[106]  H. Grice Logic and conversation , 1975 .

[107]  Susan Hurley,et al.  Perception And Action: Alternative Views , 2001, Synthese.

[108]  Deb Roy,et al.  A trainable spoken language understanding system for visual object selection , 2002, INTERSPEECH.

[109]  Andrea Salgian,et al.  A cubist approach to object recognition , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[110]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[111]  Brian Scassellati,et al.  IEEE Intelligent Systems , 2018, Computer.

[112]  Djemel Ziou,et al.  Edge Detection Techniques-An Overview , 1998 .

[113]  Eric Paulos,et al.  Fast Construction of Near Optimal Probing Strategies , 1999 .

[114]  G. Lakoff Women, fire, and dangerous things : what categories reveal about the mind , 1989 .

[115]  Pattie Maes,et al.  Postural primitives: Interactive Behavior for a Humanoid Robot Arm , 1996 .

[116]  Daniel P. Fasulo,et al.  An Analysis of Recent Work on Clustering Algorithms , 1999 .

[117]  Berthold K. P. Horn Robot vision , 1986, MIT electrical engineering and computer science series.

[118]  Erann Gat,et al.  ESL: a language for supporting robust plan execution in embedded autonomous agents , 1997, 1997 IEEE Aerospace Conference.

[119]  Terry Pratchett Only You Can Save Mankind , 1992 .

[120]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[121]  Trevor Darrell,et al.  3D pose tracking with linear depth and brightness constraints , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[122]  P. B. Petersen The New Economics for Industry, Government, Education , 1993 .

[123]  Alan C. Bovik,et al.  FOVEA: a foveated vergent active stereo vision system for dynamic three-dimensional scene recovery , 1998, IEEE Trans. Robotics Autom..

[124]  Hideki Kozima,et al.  Emergence of imitation mediated by objects , 2002 .

[125]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[126]  I. Pepperberg Referential mapping: A technique for attaching functional significance to the innovative utterances of an African Grey parrot (Psittacus erithacus) , 1990, Applied Psycholinguistics.

[127]  Rodney A. Brooks,et al.  Building brains for bodies , 1995, Auton. Robots.

[128]  Marco La Cascia,et al.  Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[129]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[130]  Haiyuan Wu,et al.  3D head pose estimation using color information , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[131]  N. Block Troubles with functionalism , 1993 .

[132]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[133]  Juyang Weng,et al.  Developmental Humanoids: Humanoids that Develop Skills Automatically , 2000 .

[134]  M. Halliday Learning How to Mean: Explorations in the Development of Language , 1975 .

[135]  Hiroshi Murase,et al.  Real-time 100 object recognition system , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[136]  Brian Scassellati,et al.  Foundations for a theory of mind for a humanoid robot , 2001 .

[137]  Maja J. Mataric,et al.  Getting Humanoids to Move and Imitate , 2000, IEEE Intell. Syst..

[138]  Trevor Darrell,et al.  Reducing drift in parametric motion tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[139]  David Kirsh,et al.  Complementary Strategies: Why we use our hands when we think , 1995 .

[140]  R. B. Pinter,et al.  Primitive Features by Steering, Quadrature, and Scale , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[141]  W. Freeman Steerable filters and local analysis of image structure , 1992 .

[142]  B. Habibi,et al.  Pengi : An Implementation of A Theory of Activity , 1998 .

[143]  Karen Zita Haigh,et al.  Planning, Execution and Learning in a Robotic Agent , 1998, AIPS.

[144]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[145]  Ellen M. Markman,et al.  Categorization and Naming in Children: Problems of Induction , 1989 .

[146]  A. Pentland,et al.  Real time tracking and modeling of faces: an EKF-based analysis by synthesis approach , 1999, Proceedings IEEE International Workshop on Modelling People. MPeople'99.

[147]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[148]  G. Rizzolatti,et al.  Action recognition in the premotor cortex. , 1996, Brain : a journal of neurology.

[149]  Michael G. Ross,et al.  Exploiting texture-motion duality in optical flow and image segmentation , 2000 .

[150]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[151]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[152]  M. Arbib,et al.  Modeling the mirror: grasp learning and action recognition , 2002 .

[153]  M. Arbib,et al.  Language within our grasp , 1998, Trends in Neurosciences.

[154]  K. Murphy Passively Learning Finite Automata , 1996 .

[155]  Marco Colombetti,et al.  Training Agents to Perform Sequential Behavior , 1994, Adapt. Behav..

[156]  Matthew M. Williamson,et al.  Series elastic actuators , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[157]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[158]  Brian Scassellati A Binocular, Foveated Active Vision System , 1998 .

[159]  P. Maes,et al.  Old tricks, new dogs: ethology and interactive creatures , 1997 .

[160]  Stanley T. Birchfield,et al.  Elliptical head tracking using intensity gradients and color histograms , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[161]  David Kirsh,et al.  Adapting the Environment Instead of Oneself , 2022 .

[162]  Giulio Sandini,et al.  A developmental approach to visually-guided reaching in artificial systems , 1999, Neural Networks.

[163]  M. Brent,et al.  The role of exposure to isolated words in early vocabulary development , 2001, Cognition.

[164]  Haim J. Wolfson,et al.  Geometric hashing: an overview , 1997 .

[165]  Matthew M. Williamson,et al.  Postural primitives: Interactive Behavior for a Humanoid Robot Arm , 1996 .

[166]  Gerald J. Sussman,et al.  Sparse Representations for Fast, One-Shot Learning , 1997, AAAI/IAAI.

[167]  Marvin Minsky,et al.  Logical vs. analogical or symbolic vs. connectionist or neat vs. scruffy , 1991 .

[168]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000, Softw. Pract. Exp..

[169]  Andrew Garland,et al.  Learning Hierarchical Task Models By Demonstration , 2002 .

[170]  C. Breazeal,et al.  That Certain Look: Social Amplification of Animate Vision , 2000 .

[171]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[172]  D. Dennett The Intentional Stance. , 1987 .

[173]  Dana H. Ballard,et al.  Animate Vision , 1991, Artif. Intell..

[174]  G. Granlund In search of a general picture processing operator , 1978 .

[175]  Andrew P. Witkin,et al.  Analyzing Oriented Patterns , 1985, IJCAI.

[176]  Michael J. Flynn,et al.  Fast Division Using Accurate Quotient Approximations to Reduce the Number of Iterations , 1992, IEEE Trans. Computers.

[177]  S. Münch,et al.  Robot Programming by Demonstration (RPD) - Using Machine Learning and User Interaction Methods for the Development of Easy and Comfortable Robot Programming Systems , 2000 .

[178]  F. Chabat,et al.  A corner orientation detector , 1999, Image Vis. Comput..

[179]  Seymour A. Papert,et al.  The Summer Vision Project , 1966 .

[180]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[181]  Michael Tomasello The Pragmatics of Word Learning , 1997 .

[182]  C. Breazeal,et al.  An Ontogenetic Perspective to Scaling Sensorimotor Intelligence , 1996 .

[183]  Ian Horswill,et al.  Specialization of perceptual processes , 1993 .

[184]  Yan-Bin Jia,et al.  Observing pose and motion through contact , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[185]  Jared Smith-Mickelson Design and application of a head detection and tracking system , 2000 .

[186]  Monica N. Nicolescu,et al.  Experience-based learning of task representations from human-robot interaction , 2001, Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No.01EX515).

[187]  Paul R. Cohen,et al.  Toward natural language interfaces for robotic agents: grounding linguistic meaning in sensors , 2000, AGENTS '00.

[188]  L. Vygotsky,et al.  Thought and Language , 1963 .

[189]  Giorgio Metta,et al.  The Whole World in Your Hand: Active and Interactive Segmentation , 2003 .

[190]  Maja J. Matarić,et al.  A Distributed Model for Mobile Robot Environment-Learning and Navigation , 1990 .

[191]  C. Gross,et al.  Visuospatial properties of ventral premotor cortex. , 1997, Journal of neurophysiology.

[192]  P. Jusczyk The discovery of spoken language , 1997 .

[193]  D. Laplane Thought and language. , 1992, Behavioural neurology.

[194]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[195]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[196]  C. Breazeal Sociable Machines: Expressive Social Ex-change Between Humans and Robots , 2000 .

[197]  Annabeth L. Propst,et al.  The New Economics: For Industry, Government, Education , 1996 .

[198]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[199]  Giorgio Bonmassar,et al.  Space-variant active vision: Definition, overview and examples , 1995, Neural Networks.

[200]  Julia Hirschberg,et al.  Corrections in spoken dialogue systems , 2000, INTERSPEECH.

[201]  David Kirsh,et al.  The Intelligent Use of Space , 1995, Artif. Intell..

[202]  Jitendra Malik,et al.  Detecting and localizing edges composed of steps, peaks and roofs , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[203]  Luc Steels,et al.  Emergent adaptive lexicons , 1996 .

[204]  Matthew M. Williamson,et al.  Robot arm control exploiting natural dynamics , 1999 .

[205]  R. James Firby Task Networks for Controlling Continuous Processes , 1994, AIPS.

[206]  Willard Van Orman Quine,et al.  Word and Object , 1960 .

[207]  Stevan Harnad,et al.  Symbol grounding and the origin of language , 2002 .

[208]  Aude Billard,et al.  Grounding communication in situated, social robots , 1997 .

[209]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[210]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[211]  Cynthia Breazeal,et al.  Toward Teaching a Robot "Infant" using Emotive Communication Acts , 1999 .

[212]  Brian Scassellati,et al.  Alternative Essences of Intelligence , 1998, AAAI/IAAI.