Representation and recognition of action in interactive spaces

This thesis presents new theory and technology for the representation and recognition of complex, context-sensitive human actions in interactive spaces. To represent action and interaction a symbolic framework has been developed based on Roger Schank’s conceptualizations, augmented by a mechanism to represent the temporal structure of the sub-actions based on Allen’s interval algebra networks. To overcome the exponential nature of temporal constraint propagation in such networks, we have developed the PNF propagation algorithm based on the projection of IA-networks into simplified, 3-valued (past, now, future) constraint networks called PNF-networks. The PNF propagation algorithm has been applied to an action recognition vision system that handles actions composed of multiple, parallel threads of sub-actions, in situations that can not be efficiently dealt by the commonly used temporal representation schemes such as finite-state machines and HMMs. The PNF propagation algorithm is also the basis of interval scripts, a scripting paradigm for interactive systems that represents interaction as a set of temporal constraints between the individual components of the interaction. Unlike previously proposed non-procedural scripting methods, we use a strong temporal representation (allowing, for example, mutually exclusive actions) and perform control by propagating the temporal constraints in real-time. These concepts have been tested in the context of four projects involving story-driven interactive spaces. The action representation framework has been used in the Intelligent Studio project to enhance the control of automatic cameras in a TV studio. Interval scripts have been extensively employed in the development of “SingSong”, a short interactive performance that introduced the idea of live interaction with computer graphics characters; in “It / I”, a full-length computer theater play; and in “It”, an interactive art installation based on the play “It / I” that realizes our concept of immersive stages, that is, interactive spaces that can be used both by performers and public.

[1]  Patrick A. V. Hall,et al.  Equivalence between AND/OR graphs and context-free grammars , 1973, Commun. ACM.

[2]  Claudio S. Pinhanez,et al.  Human action detection using PNF propagation of temporal constraints , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[3]  Dick C. A. Bulterman,et al.  CMIFed: a presentation environment for portable hypermedia documents , 1993, MULTIMEDIA '93.

[4]  Claudio S. Pinhanez,et al.  “It/I”: a theater play featuring an autonomous computer graphics character , 1998, MULTIMEDIA '98.

[5]  Kenneth M. Kahn,et al.  Mechanizing Temporal Knowledge , 1977, Artif. Intell..

[6]  Kevin M. Brooks Do story agents use rocking chairs? The theory and implementation of one model for computational narrative , 1997, MULTIMEDIA '96.

[7]  Pattie Maes,et al.  Agents that reduce work and information overload , 1994, CACM.

[8]  Stewart Brand,et al.  How Buildings Learn , 1994 .

[9]  James R. Slagle,et al.  A Heuristic Program that Solves Symbolic Integration Problems in Freshman Calculus , 1963, JACM.

[10]  Roger C. Schank,et al.  CONCEPTUAL DEPENDENCY THEORY , 1975 .

[11]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[12]  Thomas C. Henderson,et al.  Arc and Path Consistency Revisited , 1986, Artif. Intell..

[13]  Nabil Layaïda,et al.  Madeus: an authoring environment for interactive multimedia documents , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[14]  James F. Allen,et al.  Actions and Events in Interval Temporal Logic , 1994, J. Log. Comput..

[15]  Gary G. Hendrix,et al.  Modeling Simultaneous Actions and Continuous Processes , 1989, Artif. Intell..

[16]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Michael Gelfond,et al.  Representing Action and Change by Logic Programs , 1993, J. Log. Program..

[18]  Earl David Sacerdoti,et al.  A Structure for Plans and Behavior , 1977 .

[19]  Charles J. Rieger,et al.  CONCEPTUAL MEMORY AND INFERENCE , 1975 .

[20]  Claudio S. Pinhanez,et al.  Computer Theater: Stage for Action Understanding , 1996 .

[21]  Michael H. Coen Building Brains for Rooms: Designing Distributed Software Agents , 1997, AAAI/IAAI.

[22]  Matthew Brand,et al.  Understanding manipulation in video , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[23]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[24]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[25]  Nabil Layaïda,et al.  Madeus, and authoring environment for interactive multimedia documents , 1998, MULTIMEDIA '98.

[26]  Alex Pentland,et al.  The ALIVE system: full-body interaction with autonomous agents , 1995, Proceedings Computer Animation'95.

[27]  Philippe Baptiste,et al.  A Theoretical and Experimental Comparison of Constraint Propagation Techniques for Disjunctive Scheduling , 1995, IJCAI.

[28]  Demetri Terzopoulos,et al.  Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Peter van Beek,et al.  Reasoning About Qualitative Temporal Information , 1990, Artif. Intell..

[30]  Aaron F. Bobick,et al.  Video surveillance of interactions , 1999, Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223).

[31]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[32]  Ken Perlin,et al.  Real Time Responsive Animation with Personality , 1995, IEEE Trans. Vis. Comput. Graph..

[33]  James W. Davis,et al.  Virtual PAT: A Virtual Personal Aerobics Trainer , 1998 .

[34]  Kee Chang Lee,et al.  Virtual Stage: A Location-Based Karaoke System , 1998, IEEE Multim..

[35]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[36]  Claudio S. Pinhanez,et al.  Intelligent Studios Modeling Space and Action to Control TV Cameras , 1997, Appl. Artif. Intell..

[37]  Rina Dechter,et al.  Directed Constraint Networks: A Relational Framework for Causal Modeling , 1991, IJCAI.

[38]  Aaron F. Bobick,et al.  Action recognition using probabilistic parsing , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[39]  Brenda Laurel,et al.  PLACEHOLDER: landscape and narrative in virtual environments , 1994, MULTIMEDIA '94.

[40]  S. Langer Feeling and Form , 1953 .

[41]  Herbert Zettl Television Production Handbook , 1961 .

[42]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Ken Perlin,et al.  Improvisational animation , 1996, CHI Conference Companion.

[44]  David C. Hogg,et al.  Learning the Distribution of Object Trajectories for Event Recognition , 1995, BMVC.

[45]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interaction , 1999, ICVS.

[46]  W. Scott Neal Reilly,et al.  An Architecture for Action, Emotion, and Social Behavior , 1992, MAAMAW.

[47]  Eugenio Barba,et al.  A Dictionary of Theatre Anthropology: The Secret Art of the Performer , 1991 .

[48]  A F Bobick,et al.  Movement, activity and action: the role of knowledge in the perception of motion. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[49]  L. Bates,et al.  Hap A Reactive, Adaptive Architecture for Agents , 1991 .

[50]  Rodney A. Brooks,et al.  Elephants don't play chess , 1990, Robotics Auton. Syst..

[51]  Glorianna Davenport,et al.  Narrative guidance of interactivity , 1995 .

[52]  Maja J. Mataric,et al.  Behaviour-based control: examples from navigation, learning, and group behaviour , 1997, J. Exp. Theor. Artif. Intell..

[53]  James F. Allen Time and time again: The many ways to represent time , 1991, Int. J. Intell. Syst..

[54]  K. Johnstone IMPRO: Improvisation and Theatre , 1979 .

[55]  Ken'ichi Kakizaki Generating the animation of a 3D agent from explanation text , 1998, MULTIMEDIA '98.

[56]  Justine Cassell,et al.  Temporal classification of natural gesture and application to video coding , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[58]  Jacob K. White,et al.  A brief architectural overview of alice, a rapid prototyping system for vitrual reality , 1995 .

[59]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60]  Myron W. Krueger,et al.  Artificial reality II , 1991 .

[61]  Alexander Reinefeld,et al.  Effective Solution of Qualitative Interval Constraint Problems , 1992, Artif. Intell..

[62]  M. Weiser The Computer for the Twenty-First Century , 1991 .

[63]  Hiroshi Ishii,et al.  Tangible bits: towards seamless interfaces between people, bits and atoms , 1997, CHI.

[64]  Pattie Maes,et al.  Collaborative Interface Agents , 1994, AAAI.

[65]  Chris Shaw,et al.  Decoupled simulation in virtual reality with the MR toolkit , 1993, TOIS.

[66]  Ugo Montanari,et al.  Networks of constraints: Fundamental properties and applications to picture processing , 1974, Inf. Sci..

[67]  Karl Sims,et al.  Evolving virtual creatures , 1994, SIGGRAPH.

[68]  Claudio S. Pinhanez,et al.  Approximate World Models: Incorporating Qualitative and Linguistic Information into Vision Systems , 1996, AAAI/IAAI, Vol. 2.

[69]  K. Selçuk Candan,et al.  CHIMP: a framework for supporting distributed multimedia document authoring and presentation , 1997, MULTIMEDIA '96.

[70]  Greg Welch,et al.  The office of the future: a unified approach to image-based modeling and spatially immersive displays , 1998, SIGGRAPH.

[71]  M. R. Manzini Learnability and Cognition , 1991 .

[72]  N. Magnenat-Thalmann,et al.  Synthetic actors in computer-generated 3D films , 1990 .

[73]  K. Rohr Towards model-based recognition of human movements in image sequences , 1994 .

[74]  Ryohei Nakatsu,et al.  “Interactive poem system” , 1998, MULTIMEDIA '98.

[75]  Robert Taylor,et al.  Disney's Aladdin: first steps toward storytelling in virtual reality , 1996, SIGGRAPH.

[76]  R. Nelson,et al.  Low level recognition of human motion (or how to get your man without finding his body parts) , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[77]  Bruce Blumberg,et al.  Sympathetic interfaces: using a plush toy to direct synthetic characters , 1999, CHI '99.

[78]  G. Reeke The society of mind , 1991 .

[79]  D. Gavrila,et al.  3-D model-based tracking of human upper body movement: a multi-view approach , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[80]  V. Michael Bove,et al.  Multilevel Scripting for Responsive Multimedia , 1997, IEEE Multim..

[81]  Henry A. Kautz,et al.  Constraint Propagation Algorithms for Temporal Reasoning , 1986, AAAI.

[82]  Claudio S. Pinhanez,et al.  Using approximate models as source of contextual information for vision processing , 1995 .

[83]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[84]  Kenji Amaya,et al.  Emotion from Motion , 1996, Graphics Interface.

[85]  Yasuo Kuniyoshi,et al.  Qualitative Recognition of Ongoing Human Action Sequences , 1993, IJCAI.

[86]  David J. Israel,et al.  Actions and Movements , 1991, IJCAI.

[87]  Allison Druin,et al.  Computer-augmented environments: new places to learn, work, and play , 1995 .

[88]  Jeffrey Mark Siskind,et al.  Naive physics, event perception, lexical semantics, and language acquisition , 1992 .

[89]  Char Davies,et al.  Osmose: towards broadening the aesthetics of virtual reality , 1996, COMG.

[90]  G. Miller,et al.  Language and Perception , 1976 .

[91]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[92]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[93]  T. Richards At work with Grotowski on physical actions , 1995 .

[94]  Benoît Maubrey Audio Jackets and Other Electroacoustic Clothes , 1995 .

[95]  Dena Davida A Dictionary of Theatre Anthropology. The Secret Art of the Performer , 1994 .

[96]  Thomas Rist,et al.  Coping with Temporal Constraints in Multimedia Presentation Planning , 1996, AAAI/IAAI, Vol. 1.

[97]  Henry A. Kautz,et al.  Constraint propagation algorithms for temporal reasoning: a revised report , 1989 .

[98]  Itay Meiri,et al.  Combining Qualitative and Quantitative Constraints in Temporal Reasoning , 1991, Artif. Intell..

[99]  Bruce Blumberg,et al.  Multi-level direction of autonomous creatures for real-time virtual environments , 1995, SIGGRAPH.

[100]  Ryohei Nakatsu,et al.  Interactive movie system with multi-person participation and anytime interaction capabilities , 1998, MULTIMEDIA '98.

[101]  Polle Zellweger,et al.  Automatic temporal layout mechanisms , 1993, MULTIMEDIA '93.

[102]  Marvin Minsky,et al.  A framework for representing knowledge" in the psychology of computer vision , 1975 .

[103]  Brian P. Bailey,et al.  Nsync—a toolkit for building interactive multimedia presentations , 1998, MULTIMEDIA '98.

[104]  Alan Borning,et al.  The Programming Language Aspects of ThingLab, a Constraint-Oriented Simulation Laboratory , 1981, TOPL.

[105]  Christian Freksa,et al.  Temporal Reasoning Based on Semi-Intervals , 1992, Artif. Intell..

[106]  Ken Perlin,et al.  Improv: a system for scripting interactive actors in virtual worlds , 1996, SIGGRAPH.

[107]  Maja J. Matari,et al.  Behavior-based Control: Examples from Navigation, Learning, and Group Behavior , 1997 .

[108]  Alan K. Mackworth Consistency in Networks of Relations , 1977, Artif. Intell..

[109]  Darren Newtson,et al.  The objective basis of behavior units. , 1977 .

[110]  TheaterClaudio S. PinhanezPerceptual Computer Theater , 1997 .

[111]  Elliott Schwartz,et al.  Music Since 1945: Issues, Materials, and Literature , 1993 .

[112]  Jorge Lobo,et al.  Adding Knowledge to the Action Description Language A , 1997, AAAI/IAAI.

[113]  James W. Davis,et al.  The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment , 1999, Presence.

[114]  B J Gruendemann A position on the position. , 1970, AORN journal.

[115]  Ricky Yeung,et al.  TBAG: a high level framework for interactive, animated 3D graphics applications , 1994, SIGGRAPH.

[116]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[117]  Aaron F. Bobick,et al.  Learning visual behavior for gesture analysis , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[118]  W. Eric L. Grimson,et al.  Using adaptive tracking to classify and monitor activities in a site , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[119]  P. Maes,et al.  Old tricks, new dogs: ethology and interactive creatures , 1997 .

[120]  G. Davenport,et al.  Interactive transformational environments: wheel of life , 1995 .

[121]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[122]  Norman I. Badler,et al.  Simulating humans: computer graphics animation and control , 1993 .

[123]  Jugal Kalita,et al.  Natural language control of animation of task performance in a physical domain , 1990 .

[124]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[125]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[126]  Patrick J. Hayes,et al.  Moments and points in an interval‐based temporal logic , 1989, Comput. Intell..

[127]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[128]  Ray Jackendoff Semantics and Cognition , 1983 .

[129]  Rina Dechter,et al.  Temporal Constraint Networks , 1989, Artif. Intell..

[130]  Hans-Hellmut Nagel,et al.  A vision of ‘vision and language’ comprises action: An example from road traffic , 2004, Artificial Intelligence Review.

[131]  Christa Sommerer,et al.  Art as Living System (人工生命特集号) , 1996 .

[132]  Yorick Wilks,et al.  A Preferential, Pattern-Seeking, Semantics for Natural Language Inference , 1975, Artif. Intell..

[133]  Eero Hyvönen,et al.  Constraint Reasoning Based on Interval Arithmetic: The Tolerance Propagation Approach , 1992, Artif. Intell..

[134]  Claudio S. Pinhanez,et al.  Interval scripts: a design paradigm for story-based interactive systems , 1997, CHI.

[135]  Aaron F. Bobick,et al.  A Framework for Recognizing Multi-Agent Action from Visual Evidence , 1999, AAAI/IAAI.

[136]  Ramesh C. Jain,et al.  Recursive identification of gesture inputs using hidden Markov models , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[137]  Drew McDermott,et al.  Introduction to artificial intelligence , 1986, Addison-Wesley series in computer science.

[138]  L. Stark,et al.  Dissertation Abstract , 1994, Journal of Cognitive Education and Psychology.

[139]  Leora Morgenstern,et al.  Motivated Action Theory: a Formal Theory of Causal Reasoning , 1994, Artif. Intell..

[140]  Rina Dechter,et al.  From Local to Global Consistency , 1990, Artif. Intell..

[141]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[142]  Jun Rekimoto,et al.  Object composition and playback models for handling multimedia data , 1993, MULTIMEDIA '93.

[143]  Lenhart K. Schubert,et al.  Efficient Algorithms for Qualitative Reasoning about Time , 1995, Artif. Intell..

[144]  Yoav Shoham,et al.  Temporal Logics in AI: Semantical and Ontological Considerations , 1987, Artif. Intell..

[145]  Robert C. Bolles,et al.  The Representation Space Paradigm of Concurrent Evolving Object Descriptions , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[146]  Joseph A. Paradiso,et al.  The Brain Opera Technology: New Instruments and Gestural Sensors for Musical Interaction and Performance , 1999 .

[147]  Rosalind W. Picard Affective Computing , 1997 .

[148]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[149]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[150]  Drew McDermott,et al.  Temporal Data Base Management , 1987, Artif. Intell..

[151]  Vipin Kumar,et al.  Algorithms for Constraint-Satisfaction Problems: A Survey , 1992, AI Mag..

[152]  Steve Strassmann Semi-Autonomous Animated Actors , 1994, AAAI.

[153]  R. Schechner,et al.  The Grotowski sourcebook , 1997 .

[154]  Robert H. Thibadeau,et al.  Artificial Perception of Actions , 1986, Cogn. Sci..

[155]  Alex Pentland,et al.  Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[156]  Alex Pentland,et al.  Action Reaction Learning: Automatic Visual Analysis and Synthesis of Interactive Behaviour , 1999, ICVS.

[157]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[158]  李幼升,et al.  Ph , 1989 .

[159]  Allan D. Jepson,et al.  Computational Perception of Scene Dynamics , 1996, ECCV.

[160]  Claudio S. Pinhanez,et al.  Controlling view-based algorithms using approximate world models and action information , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[161]  Jitendra Malik,et al.  Reasoning in Time and Space , 1983, IJCAI.

[162]  Henry A. Kautz,et al.  Integrating Metric and Qualitative Temporal Reasoning , 1991, AAAI.