Coordinating interactive vision behaviors for cognitive assistance

Most of the research conducted in human-computer interaction (HCI) focuses on a seamless interface between a user and an application that is separated from the user in terms of working space and/or control, like navigation in image databases, instruction of robots, or information retrieval systems. The interaction paradigm of cognitive assistance goes one step further in that the application consists of assisting the user performing everyday tasks in his or her own environment and in that the user and the system share the control of such tasks. This kind of tight bidirectional interaction in realistic environments demands cognitive system skills like context awareness, attention, learning, and reasoning about the external environment. Therefore, the system needs to integrate a wide variety of visual functions, like localization, object tracking and recognition, action recognition, interactive object learning, etc. In this paper we show how different kinds of system behaviors are realized using the Active Memory Infrastructure that provides the technical basis for distributed computation and a data- and event-driven integration approach. A running augmented reality system for cognitive assistance is presented that supports users in mixing beverages. The flexibility and generality of the system framework provides an ideal testbed for studying visual cues in human-computer interaction. We report about results from first user studies.

[1]  H. Ritter,et al.  Interactive online learning , 2007, Pattern Recognition and Image Analysis.

[2]  James L. Crowley,et al.  Dynamic composition of process federations for context aware perception of human activity , 2003, IEMC '03 Proceedings. Managing Technologically Driven Organizations: The Human Side of Innovation and Change (IEEE Cat. No.03CH37502).

[3]  Alex Pentland,et al.  Perceptual user interfaces: perceptual intelligence , 2000, CACM.

[4]  Manmohan Krishna Chandraker,et al.  Real-Time Camera Pose in a Room , 2003, ICVS.

[5]  Jannik Fritsch,et al.  "BIRON, let me show you something": evaluating the interaction with a robot companion , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[6]  Ben J. A. Kröse,et al.  Jijo-2: An Office Robot that Communicates and Learns , 2001, IEEE Intell. Syst..

[7]  Heinrich Niemann,et al.  Efficient Feature Tracking for Long Video Sequences , 2004, DAGM-Symposium.

[8]  James Lyle Peterson,et al.  Petri net theory and the modeling of systems , 1981 .

[9]  Bruce A. Draper,et al.  The schema system , 1988, International Journal of Computer Vision.

[10]  Deb Roy,et al.  Learning visually grounded words and syntax of natural spoken language , 2000 .

[11]  Sebastian Lang,et al.  BIRON - The Bielefeld Robot Companion , 2004 .

[12]  Christian Bauckhage,et al.  Combining environmental cues & head gestures to interact with wearable devices , 2005, ICMI '05.

[13]  Rodney A. Brooks,et al.  Model-Based Three-Dimensional Interpretations of Two-Dimensional Images , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Joachim Denzler,et al.  Efficient Combination of Histograms for Real-Time Tracking Using Mean-Shift and Trust-Region Optimization , 2005, DAGM-Symposium.

[15]  Alessandro Saffiotti,et al.  Perceptual Anchoring of Symbols for Action , 2001, IJCAI.

[16]  Monique Thonnat,et al.  A knowledge-based approach to integration of image processing procedures , 1993 .

[17]  Alex Pentland,et al.  Perceptual Intelligence , 1999, HUC.

[18]  Bruce A. Draper,et al.  ADORE: Adaptive Object Recognition , 1999, ICVS.

[19]  Christian Bauckhage,et al.  An XML based framework for cognitive vision architectures , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[20]  Christian Bauckhage,et al.  Memory consistency validation in a cognitive vision system , 2004, ICPR 2004.

[21]  Hiroshi Ishii,et al.  Ambient Displays: Turning Architectural Space into an Interface between People and Digital Information , 1998, CoBuild.

[22]  Paul Grefen,et al.  A Three-Level Process Framework for Contract-Based Dynamic Service Outsourcing , 2003 .

[23]  Tobias Höllerer,et al.  Vision-based interfaces for mobility , 2004, The First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services, 2004. MOBIQUITOUS 2004..

[24]  Bernhard Schölkopf,et al.  Pattern Recognition , 2004, Lecture Notes in Computer Science.

[25]  Mathias Kölsch,et al.  Fast 2D Hand Tracking with Flocks of Features and Multi-Cue Integration , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[26]  Jannik Fritsch,et al.  Combining sensory and symbolic data for manipulative gesture recognition , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[27]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[28]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[29]  David E. Breen,et al.  Confluence of Computer Vision and Interactive Graphies for Augmented Reality , 1997, Presence: Teleoperators & Virtual Environments.

[30]  Pattara Kiatisevi,et al.  A distributed architecture for interactive robots based on a knowledge software platform , 2005 .

[31]  Paul Milgram,et al.  Perceptual issues in augmented reality , 1996, Electronic Imaging.

[32]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[33]  Gernot A. Fink Developing HMM-Based Recognizers with ESMERALDA , 1999, TSD.

[34]  Ekkart Kindler,et al.  The Petri Net Markup Language , 2003, Petri Net Technology for Communication-Based Systems.

[35]  Luc Steels,et al.  Aibo''s first words. the social learning of language and meaning. Evolution of Communication , 2002 .

[36]  Cynthia Breazeal,et al.  Learning From and About Others: Towards Using Imitation to Bootstrap the Social Understanding of Others by Robots , 2005, Artificial Life.

[37]  Scott S. Fisher,et al.  Stereoscopic Displays and Virtual Reality Systems XIV , 2007 .

[38]  James L. Crowley,et al.  Perceptual user interfaces: things that see , 2000, CACM.

[39]  Christian Bauckhage,et al.  An active memory as a model for information fusion , 2004 .

[40]  Martin Wagner,et al.  An Architecture for Distributed Spatial Configuration of Context Aware Applications , 2003 .

[41]  Christopher M. Brown,et al.  Control of selective perception using bayes nets and decision theory , 1994, International Journal of Computer Vision.

[42]  Sven Wachsmuth,et al.  An integrated system for cooperative man-machine interaction , 2001, Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No.01EX515).

[43]  Martin Pellkofer,et al.  EMS-Vision: a perceptual system for autonomous vehicles , 2002, IEEE Trans. Intell. Transp. Syst..

[44]  Wolfgang Ponweiser,et al.  A Reusable Dynamic Framework for Cognitive Vision Systems , 2003 .

[45]  Dima Damen,et al.  Detecting Carried Objects in Short Video Sequences , 2008, ECCV.

[46]  Axel Pinz,et al.  A mobile AR kit as a human computer interface for cognitive vision , 2004 .

[47]  Axel Pinz,et al.  A Flexible Software Architecture for Hybrid Tracking , 2004 .

[48]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[49]  Thad Starner Wearable Computers: No Longer Science Fiction , 2002, IEEE Pervasive Comput..

[50]  Christian Bauckhage,et al.  Integration frameworks for large scale cognitive vision systems - an evaluative study , 2004, ICPR 2004.

[51]  Helge J. Ritter,et al.  Adaptive Computer Vision: Online Learning for Object Recognition , 2004, DAGM-Symposium.

[52]  H. Pluckrose,et al.  Things to See , 1973 .

[53]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[54]  Y. Aloimonos Active Perception , 1993 .