Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions

We present a comparative evaluation of various techniques for action recognition while keeping as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both categories we use their corresponding nearest neighbour classifiers, kernels, and recent kernelised sparse representations. We compare against traditional action recognition techniques based on Gaussian mixture models and Fisher vectors FVs. We evaluate these action recognition techniques under ideal conditions, as well as their sensitivity in more challenging conditions variations in scale and translation. Despite recent advancements for handling manifolds, manifold based techniques obtain the lowest performance and their kernel representations are more unstable in the presence of challenging conditions. The FV approach obtains the highest accuracy under ideal conditions. Moreover, FV best deals with moderate scale and translation changes.

[1]  Yui Man Lui,et al.  Tangent Bundles on Special Manifolds for Action Recognition , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[3]  Hongdong Li,et al.  Optimizing over Radial Kernels on Compact Manifolds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Brian C. Lovell,et al.  Sparse Coding and Dictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach , 2012, ECCV.

[5]  J. Ross Beveridge,et al.  Tangent bundle for human action recognition , 2011, Face and Gesture 2011.

[6]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[7]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  R. K. Agrawal,et al.  First and Second Order Statistics Features for Classification of Magnetic Resonance Brain Images , 2012 .

[9]  Conrad Sanderson,et al.  Log-Euclidean bag of words for human action recognition , 2014, IET Comput. Vis..

[10]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[12]  Hongdong Li,et al.  Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Conrad Sanderson,et al.  Bags of Affine Subspaces for Robust Object Tracking , 2014, 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[14]  Yunde Jia,et al.  Manifold Kernel Sparse Representation of Symmetric Positive-Definite Matrices and Its Applications , 2015, IEEE Transactions on Image Processing.

[15]  Brian C. Lovell,et al.  Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Janusz Konrad,et al.  Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels , 2010, ICPR Contests.

[17]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[18]  Brian C. Lovell,et al.  Random projections on manifolds of Symmetric Positive Definite matrices for image classification , 2014, IEEE Winter Conference on Applications of Computer Vision.

[19]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Satoshi Hirose,et al.  An empirical solution for over-pruning with a novel ensemble-learning method for fMRI decoding , 2015, Journal of Neuroscience Methods.

[21]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Brian C. Lovell,et al.  Multi-Action Recognition via Stochastic Modelling of Optical Flow and Gradients , 2014, MLSDA'14.

[23]  Massimo Piccardi,et al.  Comparison of Classifiers for Human Activity Recognition , 2007, IWINAC.

[24]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[25]  Gabriela Csurka,et al.  Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations , 2010, VISIGRAPP.

[26]  Issa Traore,et al.  Continuous Authentication Using Biometrics: Data, Models, and Metrics , 2011 .

[27]  Rama Chellappa,et al.  Nearest-neighbor search algorithms on non-Euclidean manifolds for computer vision applications , 2010, ICVGIP '10.

[28]  Rama Chellappa,et al.  Kernel Learning for Extrinsic Classification of Manifold Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Mehrtash Tafazzoli Harandi,et al.  More about VLAD: A leap from Euclidean to Riemannian manifolds , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Brian C. Lovell,et al.  Clustering on Grassmann manifolds via kernel embedding with application to action analysis , 2012, 2012 19th IEEE International Conference on Image Processing.

[31]  M. Narasimha Murty,et al.  Nearest Neighbour Based Classifiers , 2011 .

[32]  Lei Wang,et al.  Learning Discriminative Stein Kernel for SPD Matrices and Its Applications , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Brian C. Lovell,et al.  Kernel analysis on Grassmann manifolds for action recognition , 2013, Pattern Recognit. Lett..

[34]  Zicheng Liu,et al.  Action detection using multiple spatial-temporal interest point features , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[35]  Tal Hassner,et al.  A Critical Review of Action Recognition Benchmarks , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[36]  Brian C. Lovell,et al.  Spatio-temporal covariance descriptors for action and gesture recognition , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[37]  Rama Chellappa,et al.  Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Brian C. Lovell,et al.  Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors , 2016, PAKDD Workshops.

[40]  Brian C. Lovell,et al.  Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[41]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[42]  Junbin Gao,et al.  Low Rank Representation on Grassmann Manifolds , 2014, ACCV.

[43]  Conrad Sanderson,et al.  Armadillo: a template-based C++ library for linear algebra , 2016, J. Open Source Softw..

[44]  Dario Bini,et al.  Computing the Karcher mean of symmetric positive definite matrices , 2013 .

[45]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[46]  Yui Man Lui,et al.  Human gesture recognition on product manifolds , 2012, J. Mach. Learn. Res..

[47]  Jenq-Neng Hwang,et al.  A Review on Video-Based Human Activity Recognition , 2013, Comput..

[48]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[49]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[50]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[51]  Radha Poovendran,et al.  Human activity recognition for video surveillance , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[52]  Bruce A. Draper,et al.  Scalable action recognition with a subspace forest , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Janusz Konrad,et al.  Action Recognition From Video Using Feature Covariance Matrices , 2013, IEEE Transactions on Image Processing.

[54]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.