FOCUS: a generalized method for object discovery for robots that observe and interact with humans

The essence of the signal-to-symbol problem consists of associating a symbolic description of an object (e.g., a chair) to a signal (e.g., an image) that captures the real object. Robots that interact with humans in natural environments must be able to solve this problem correctly and robustly. However, the problem of providing complete object models a priori to a robot so that it can understand its environment from any viewpoint is extremely difficult to solve. Additionally, many objects have different uses which in turn can cause ambiguities when a robot attempts to reason about the activities of a human and their interactions with those objects. In this paper, we build upon the fact that robots that co-exist with humans should have the ability of observing humans using the different objects and learn the corresponding object definitions. We contribute an object recognition algorithm, FOCUS, that is robust to the variations of signals, combines structure and function of an object, and generalizes to multiple similar objects. FOCUS, which stands for Finding Object Classification through Use and Structure, combines an activity recognizer capable of capturing how an object is used with a traditional visual structure processor. FOCUS learns structural properties (visual features) of objects by knowing first the object's affordance properties and observing humans interacting with that object with known activities. The strength of the method relies on the fact that we can define multiple aspects of an object model, i.e., structure and use, that are individually robust but insufficient to define the object, but can do when combined.

[1]  A. Borghi Object concepts and action: extracting affordances from objects parts. , 2004, Acta psychologica.

[2]  Yasuo Kuniyoshi,et al.  Statistical manipulation learning of unknown objects by a multi-fingered robot hand , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[3]  Peter Bakker,et al.  Robot see, robot do: An overview of robot imitation , 1996 .

[4]  Shigeki Aoki,et al.  Scene recognition based on relationship between human actions and objects , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[6]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.

[7]  Ernst D. Dickmanns,et al.  Recursive 3-D Road and Relative Ego-State Recognition , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Marvin Minsky,et al.  Society of Mind: A Response to Four Reviews , 1991, Artif. Intell..

[10]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[11]  Thomas S. Huang,et al.  Gesture modeling and recognition using finite state machines , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[12]  Pradeep K. Khosla,et al.  Gesture-based programming: a preliminary demonstration , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[13]  Manuela Veloso,et al.  Automated Robot Behavior Recognition Applied to Robotic Soccer , 1999 .

[14]  BlakeAndrew,et al.  C ONDENSATION Conditional Density Propagation forVisual Tracking , 1998 .

[15]  Darrin C. Bentivegna,et al.  Learning from Observation and Practice at the Action Generation Level , 2003 .

[16]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[17]  Brett Browning,et al.  CAMEO: Camera Assisted Meeting Event Observer , 2007 .

[18]  L. S. Mark,et al.  Eyeheight-scaled information about affordances: a study of sitting and stair climbing. , 1987, Journal of experimental psychology. Human perception and performance.

[19]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[20]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[21]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[22]  Dana H. Ballard,et al.  Recognizing teleoperated manipulations , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[23]  Michael Isard,et al.  Active Contours , 2000, Springer London.

[24]  Diane J. Cook,et al.  Learning Membership Functions in a Function-Based Object Recognition System , 1995, J. Artif. Intell. Res..

[25]  Matthew Brand,et al.  Physics-Based Visual Understanding , 1997, Comput. Vis. Image Underst..

[26]  Svetha Venkatesh,et al.  Recognizing and monitoring high-level behaviors in complex spatial environments , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[27]  Yan Huang,et al.  ARGMode - Activity Recognition using Graphical Models , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[28]  Kevin W. Bowyer,et al.  Generic recognition through qualitative reasoning about 3-D shape and object function , 1991, CVPR.

[29]  Azriel Rosenfeld,et al.  Recognition by Functional Parts , 1995, Comput. Vis. Image Underst..

[30]  Raúl Rojas,et al.  Tracking regions and edges by shrinking and growing , 2003 .

[31]  Alex Pentland,et al.  Probabilistic visual learning for object detection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[32]  Jukka Riekki,et al.  Vision-based behaviors for multi-robot cooperation , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[33]  Artur Arsenio Object Recognition from Multiple Percepts , 2004 .

[34]  Marvin Minsky,et al.  Society of Mind Project , 1988 .

[35]  Diane J. Cook,et al.  Learning Fuzzy Membership Functions in a Function-Based Object Recognition System , 1993, Fuzzy Logic in Artificial Intelligence.

[36]  Paul Fitzpatrick Object Lesson: Discovering and Learning to Recognize Objects , 2002 .

[37]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.