Human action recognition with sparse classification and multiple‐view learning

Employing multiple camera viewpoints in the recognition of human actions increases performance. This paper presents a feature fusion approach to efficiently combine 2D observations extracted from different camera viewpoints. Multiple-view dimensionality reduction is employed to learn a common parameterization of 2D action descriptors computed for each one of the available viewpoints. Canonical correlation analysis and their variants are employed to obtain such parameterizations. A sparse sequence classifier based on L1 regularization is proposed to avoid the problem of having to choose the proper number of dimensions of the common parameterization. The proposed system is employed in the classification of the Inria Xmas Motion Acquisition Sequences IXMAS data set with successful results.

[1]  Alexandros Iosifidis,et al.  Multi-view human movement recognition based on fuzzy distances and linear discriminant analysis , 2012, Comput. Vis. Image Underst..

[2]  Sidney S. Fels,et al.  A Multi-Camera Surveillance System that Estimates Quality-of-View Measurement , 2007, 2007 IEEE International Conference on Image Processing.

[3]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Ioannis Pitas,et al.  The i3DPost Multi-View and 3D Human Action/Interaction Database , 2009, 2009 Conference for Visual Media Production.

[5]  Gang Qian,et al.  View-invariant full-body gesture recognition via multilinear analysis of voxel data , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[6]  B. S. Manjunath,et al.  Probabilistic subspace-based learning of shape dynamics modes for multi-view action recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Ioannis Pitas,et al.  3D Human Action Recognition for Multi-view Camera Systems , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[9]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[12]  Alexandros André Chaaraoui,et al.  A review on vision techniques applied to Human Behaviour Analysis for Ambient-Assisted Living , 2012, Expert Syst. Appl..

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Massimo Piccardi,et al.  Hidden Markov Models with Kernel Density Estimation of Emission Probabilities and their Use in Activity Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[16]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Eraldo Ribeiro,et al.  Human Motion Recognition Using Isomap and Dynamic Time Warping , 2007, Workshop on Human Motion.

[18]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[19]  Guangyou Xu,et al.  Human action recognition in smart classroom , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[20]  Liang Wang,et al.  Visual learning and recognition of sequential data manifolds with applications to human movement analysis , 2008, Comput. Vis. Image Underst..

[21]  Avinash C. Kak,et al.  Distributed and lightweight multi-camera human activity classification , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[22]  Manuela M. Veloso,et al.  Feature selection in conditional random fields for activity recognition , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Hamid K. Aghajan,et al.  On efficient use of multi-view data for activity recognition , 2010, ICDSC '10.

[24]  Pedro Ribeiro,et al.  Human Activity Recognition from Video: modeling, feature selection and classification architecture , 2005 .

[25]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[26]  Gian Luca Foresti,et al.  Event classification for automatic visual-based surveillance of parking lots , 2004, ICPR 2004.

[27]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[28]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[29]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[30]  Rama Chellappa,et al.  Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Chen Wu,et al.  Multiview activity recognition in smart homes with spatio-temporal features , 2010, ICDSC '10.

[32]  Sridha Sridharan,et al.  Multi-view human pose estimation using modified five-point skeleton model , 2008 .

[33]  Lihi Zelnik-Manor,et al.  Viewpoint Selection for Human Actions , 2012, International Journal of Computer Vision.

[34]  Thomas B. Moeslund,et al.  A Local 3-D Motion Descriptor for Multi-View Human Action Recognition from 4-D Spatio-Temporal Interest Points , 2012, IEEE Journal of Selected Topics in Signal Processing.

[35]  Vinodkrishnan Kulathumani,et al.  Real-time multi-view human action recognition using a wireless camera network , 2011, 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras.

[36]  Ehud Rivlin,et al.  Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[37]  Belur V. Dasarathy,et al.  Sensor fusion potential exploitation-innovative architectures and illustrative applications , 1997, Proc. IEEE.

[38]  I. Jolliffe Principal Component Analysis , 2002 .

[39]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Pinar Duygulu Sahin,et al.  A new pose-based representation for recognizing actions from multiple cameras , 2011, Comput. Vis. Image Underst..

[41]  Sophia Ananiadou,et al.  Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.

[42]  Miguel A. Patricio,et al.  A probabilistic, discriminative and distributed system for the recognition of human actions from multiple views , 2012, Neurocomputing.

[43]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[44]  Ling Shao,et al.  Multi-view action recognition using local similarity random forests and sensor fusion , 2013, Pattern Recognit. Lett..

[45]  Mubarak Shah,et al.  Motion-based recognition a survey , 1995, Image Vis. Comput..

[46]  B. Bakshi,et al.  Bayesian principal component analysis , 2002 .

[48]  Ying Wang,et al.  Multi-view Gymnastic Activity Recognition with Fused HMM , 2007, ACCV.