The meaning of action: a review on action recognition and mapping

In this paper, we analyze the different approaches taken to date within the computer vision, robotics and artificial intelligence communities for the representation, recognition, synthesis and understanding of action. We deal with action at different levels of complexity and provide the reader with the necessary related literature references. We put the literature references further into context and outline a possible interpretation of action by taking into account the different aspects of action recognition, action synthesis and task-level planning.

[1]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[2]  Gordon Cheng,et al.  A full body human motion capture system using particle filtering and on-the-fly edge detection , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[3]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  Pietro Perona,et al.  Hybrid models for human motion recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  G. Rizzolatti,et al.  Motor and cognitive functions of the ventral premotor cortex , 2002, Current Opinion in Neurobiology.

[7]  Ales Ude,et al.  Robust estimation of human body kinematics from video , 1999, Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289).

[8]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[9]  Maja J. Matarić,et al.  Behavior-based primitives for articulated control , 1998 .

[10]  Guangyou Xu,et al.  Subject-independent natural action recognition , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[11]  R. Wilensky Planning and Understanding: A Computational Approach to Human Reasoning , 1983 .

[12]  Pietro Perona,et al.  Decomposition of human motion into dynamics-based primitives with application to drawing tasks , 2003, Autom..

[13]  C. Breazeal,et al.  Robots that imitate humans , 2002, Trends in Cognitive Sciences.

[14]  Yiannis Demiris,et al.  Distributed, predictive perception of actions: a biologically inspired robotics architecture for imitation and learning , 2003, Connect. Sci..

[15]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[16]  Stephanie R. Taylor,et al.  Analysis and recognition of walking movements , 2002, Object recognition supported by user interaction for service robots.

[17]  Henry A. Kautz,et al.  Generalized Plan Recognition , 1986, AAAI.

[18]  A. Elgammal,et al.  Separating style and content on a nonlinear manifold , 2004, CVPR 2004.

[19]  G. Sandini,et al.  Understanding mirror neurons. , 2006 .

[20]  Taisuke Sato,et al.  Bayesian classification of task-oriented actions based on stochastic context-free grammar , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[21]  Robert P. Goldman,et al.  Recognizing Plan/Goal Abandonment , 2003, IJCAI.

[22]  Maja J. Matarić,et al.  Modularization of Human Motion into Actions and Behaviors , 2002 .

[23]  Katsushi Ikeuchi,et al.  Toward automatic robot instruction from perception-temporal segmentation of tasks from human hand motion , 1993, IEEE Trans. Robotics Autom..

[24]  Mark Steedman,et al.  Plans, Affordances, And Combinatory Grammar , 2002 .

[25]  G. Rizzolatti,et al.  Visuomotor neurons: ambiguity of the discharge or 'motor' perception? , 2000, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[26]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, ICCV.

[27]  M. Shah,et al.  On the use of anthropometry in the invariant analysis of human actions , 2004, ICPR 2004.

[28]  Jun Tani,et al.  Motor primitive and sequence self-organization in a hierarchical recurrent neural network , 2004, Neural Networks.

[29]  Mark Steedman,et al.  Temporality , 1997, Handbook of Logic and Language.

[30]  G. Rizzolatti,et al.  Parietal cortex: from sight to action , 1997, Current Opinion in Neurobiology.

[31]  Danica Kragic,et al.  Learning Task Models from Multiple Human Demonstrations , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[32]  N. S. Sridharan,et al.  The Plan Recognition Problem: An Intersection of Psychology and Artificial Intelligence , 1978, Artif. Intell..

[33]  José Santos-Victor,et al.  Visual learning by imitation with motor representations , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  Volker Krüger,et al.  Recognizing Action Primitives in Complex Actions Using Hidden Markov Models , 2006, ISVC.

[35]  Mubarak Shah,et al.  Recognizing human actions in videos acquired by uncalibrated moving cameras , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  Behzad Dariush,et al.  Human motion analysis for biomechanics and biomedicine , 2003, Machine Vision and Applications.

[37]  David G. Kendall,et al.  Shape & Shape Theory , 1999 .

[38]  José Santos-Victor,et al.  A Developmental Roadmap for Learning by Imitation in Robots , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[39]  Mohiuddin Ahmad,et al.  HMM-based Human Action Recognition Using Multiview Image Sequences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[40]  Xiao Li,et al.  Human motion recognition based on neural network , 2005, Proceedings. 2005 International Conference on Communications, Circuits and Systems, 2005..

[41]  Henry A. Kautz,et al.  Location-Based Activity Recognition using Relational Markov Networks , 2005, IJCAI.

[42]  S. Schaal Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[43]  Rama Chellappa,et al.  A Factorization Approach for Activity Recognition , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[44]  Katsushi Ikeuchi,et al.  Generation of a task model by integrating multiple observations of human demonstrations , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[45]  Tieniu Tan,et al.  Silhouette Analysis-Based Gait Recognition for Human Identification , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[47]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[48]  Aude Billard,et al.  Goal-Directed Imitation in a Humanoid Robot , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[49]  Tamim Asfour,et al.  Imitation Learning of Dual-Arm Manipulation Tasks in Humanoid Robots , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[50]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[51]  Katsushi Ikeuchi,et al.  Toward an assembly plan from observation. I. Task recognition with polyhedral objects , 1994, IEEE Trans. Robotics Autom..

[52]  Bobby Bodenheimer,et al.  An evaluation of a cost metric for selecting transitions between motion segments , 2003, SCA '03.

[53]  Rüdiger Dillmann,et al.  Teaching and learning of robot tasks via observation of human performance , 2004, Robotics Auton. Syst..

[54]  G. E. Barton Jr. On the Complexity of ID/LP Parsing , 1985, CL.

[55]  Danica Kragic,et al.  Grasp Recognition for Programming by Demonstration , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[56]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[58]  Andrew Blake,et al.  Towards the automatic analysis of complex human body motions , 2002, Image Vis. Comput..

[59]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Cristian Sminchisescu,et al.  Conditional Random Fields for Contextual Human Motion Recognition , 2005, ICCV.

[61]  Christopher G. Atkeson,et al.  Methods for Motion Generation and Interaction with a Humanoid Robot: Case Studies of Dancing and Catching , 2000 .

[62]  Candace L. Sidner,et al.  Plan parsing for intended response recognition in discourse 1 , 1985, Comput. Intell..

[63]  Yasuharu Koike,et al.  PII: S0893-6080(96)00043-3 , 1997 .

[64]  Patrick Bouthemy,et al.  Real-Time Tracking of Moving Persons by Exploiting Spatio-Temporal Image Slices , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002, J. Artif. Intell. Res..

[66]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[67]  A F Bobick,et al.  Movement, activity and action: the role of knowledge in the perception of motion. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[68]  T. K. Carne,et al.  Shape and Shape Theory , 1999 .

[69]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[70]  Ying Wu,et al.  Vision-Based Gesture Recognition: A Review , 1999, Gesture Workshop.

[71]  Katsushi Ikeuchi,et al.  Recognition of human task by attention point analysis , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[72]  Robert P. Goldman,et al.  A New Model of Plan Recognition , 1999, UAI.

[73]  Guangyou Xu,et al.  Human action recognition with primitive-based coupled-HMM , 2002, Object recognition supported by user interaction for service robots.

[74]  Gal A. Kaminka,et al.  Fast and Complete Symbolic Plan Recognition , 2005, IJCAI.

[75]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  J. K. Aggarwal,et al.  Tracking and recognizing two-person interactions in outdoor image sequences , 2001, Proceedings 2001 IEEE Workshop on Multi-Object Tracking.

[77]  Gary R. Bradski,et al.  Motion segmentation and pose recognition with motion history gradients , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[78]  Drew McDermott,et al.  Introduction to artificial intelligence , 1986, Addison-Wesley series in computer science.

[79]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[80]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[81]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[82]  Jake K. Aggarwal,et al.  Semantic-level Understanding of Human Actions and Interactions using Event Hierarchy , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[83]  Marc B. Vilain,et al.  Deduction as Parsing: Tractable Classification in the KL-ONE Framework , 1991, AAAI.

[84]  Jenq-Neng Hwang,et al.  Object-based analysis and interpretation of human motion in sports video sequences by dynamic bayesian networks , 2003, Comput. Vis. Image Underst..

[85]  Larry S. Davis,et al.  Learning dynamics for exemplar-based gesture recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[86]  G. Rizzolatti,et al.  Premotor cortex and the recognition of motor actions. , 1996, Brain research. Cognitive brain research.

[87]  A. Meltzoff,et al.  Imitation of Facial and Manual Gestures by Human Neonates , 1977, Science.

[88]  Ales Ude,et al.  Programming full-body movements for humanoid robots by observation , 2004, Robotics Auton. Syst..

[89]  José Santos-Victor,et al.  Visual transformations in gesture imitation: what you see is what you do , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[90]  Henry A. Kautz A formal theory of plan recognition , 1987 .

[91]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[92]  Masayuki Inaba,et al.  From visuo-motor self learning to early imitation-a neural architecture for humanoid learning , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[93]  Katsushi Ikeuchi,et al.  Acquiring hand-action models by attention point analysis , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[94]  Stefan Schaal,et al.  Computational approaches to motor learning by imitation. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[95]  Hui Gao,et al.  Recognizing human action efforts: an adaptive three-mode PCA framework , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[96]  Tieniu Tan,et al.  Fusion of static and dynamic body biometrics for gait recognition , 2004, IEEE Trans. Circuits Syst. Video Technol..

[97]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[98]  Robert P. Goldman,et al.  A Bayesian Model of Plan Recognition , 1993, Artif. Intell..

[99]  Michael P. Wellman,et al.  Probabilistic State-Dependent Grammars for Plan Recognition , 2000, UAI.

[100]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[101]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[102]  Alexander Stoytchev,et al.  Behavior-Grounded Representation of Tool Affordances , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[103]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[104]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[105]  William J. Christmas,et al.  Robust Player Gesture Spotting and Recognition in Low-Resolution Sports Video , 2006, ECCV.

[106]  John McCarthy,et al.  Circumscription - A Form of Non-Monotonic Reasoning , 1980, Artif. Intell..

[107]  Ze-Nian Li,et al.  Successive Convex Matching for Action Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[108]  C. Raymond Perrault,et al.  Beyond Question-Answering. , 1981 .

[109]  Giulio Sandini,et al.  Learning about objects through action - initial steps towards artificial cognition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[110]  Mohiuddin Ahmad,et al.  Human action recognition using multi-view image sequences , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[111]  Rama Chellappa,et al.  Activity recognition using the dynamics of the configuration of interacting objects , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[112]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[113]  HiltonAdrian,et al.  A survey of advances in vision-based human motion capture and analysis , 2006 .

[114]  Daniel Grest,et al.  Parametric Hidden Markov Models for Recognition and Synthesis of Movements , 2007, BMVC.

[115]  Daniel Grest,et al.  Using Hidden Markov Models for Recognizing Action Primitives in Complex Actions , 2007, SCIA.

[116]  Howard D. Wactlar,et al.  Combining motion segmentation with tracking for activity analysis , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[117]  Yiannis Demiris,et al.  Hierarchies of Coupled Inverse and Forward Models for Abstraction in Robot Action Planning, Recognition and Imitation , 2005 .

[118]  Hans-Hellmut Nagel,et al.  From image sequences towards conceptual descriptions , 1988, Image Vis. Comput..

[119]  G. Edward Barton,et al.  On the complexity of ID/LP parsing 1 , 1985 .

[120]  Maja J. Mataric,et al.  Performance-Derived Behavior Vocabularies: Data-Driven Acquisition of Skills from Motion , 2004, Int. J. Humanoid Robotics.

[121]  Katsushi Ikeuchi,et al.  Toward automatic robot instruction from perception-mapping human grasps to manipulator grasps , 1997, IEEE Trans. Robotics Autom..

[122]  Katsushi Ikeuchi,et al.  Humanoid robot motion generation with sequential physical constraints , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[123]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[124]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[125]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[126]  Xavier Varona,et al.  aSpaces : Action Spaces for Recognition and Synthesis of Human Actions , 2002, AMDO.

[127]  Yoshihiko Nakamura,et al.  Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..

[128]  Aude Billard,et al.  Imitation : a review , 2002 .

[129]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[130]  Gordon Cheng,et al.  Learning to Act from Observation and Practice , 2004, Int. J. Humanoid Robotics.

[131]  Michael J. Black,et al.  Learning and Tracking Cyclic Human Motion , 2000, NIPS.

[132]  Ian D. Reid,et al.  Behaviour understanding in video: a combined method , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[133]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[134]  John McCarthy,et al.  Addendum: Circumscription and other Non-Monotonic Formalisms , 1980, Artif. Intell..

[135]  E Bizzi,et al.  Motor learning through the combination of primitives. , 2000, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[136]  Michael A. Arbib,et al.  Perceptual Structures and Distributed Motor Control , 1981 .

[137]  Jake K. Aggarwal,et al.  Human motion: modeling and recognition of actions and interactions , 2004, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004..

[138]  David I. Beaver,et al.  The Handbook of Logic and Language , 1997 .

[139]  Antonio Robles-Kelly,et al.  A Tuned Eigenspace Technique for Articulated Motion Recognition , 2006, ECCV.

[140]  Gordon Cheng,et al.  Discovering optimal imitation strategies , 2004, Robotics Auton. Syst..

[141]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[142]  Hui Gao,et al.  Gender Recognition from Walking Movements using Adaptive Three-Mode PCA , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[143]  Nicola J. Ferrier,et al.  Repetitive motion analysis: segmentation and event classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[144]  Irfan A. Essa,et al.  Learning Temporal Sequence Model from Partially Labeled Data , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[145]  G. Rizzolatti,et al.  Neurophysiological mechanisms underlying the understanding and imitation of action , 2001, Nature Reviews Neuroscience.

[146]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[147]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[148]  R. Basri,et al.  Shape representation and classification using the Poisson equation , 2004, CVPR 2004.

[149]  Kerstin Dautenhahn,et al.  Of hummingbirds and helicopters: An algebraic framework for interdisciplinary studies of imitation a , 2000 .

[150]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[151]  Darren Newtson,et al.  The objective basis of behavior units. , 1977 .