Recognising occluded multi-view actions using local nearest neighbour embedding

We propose a robust learning-free algorithm: local nearest neighbour embedding (LNNE).We introduce 3 multi-view fusion scenarios to test the LNNE method.We conduct extensive experiments on two multi-view action data sets with occlusions, where the LNNE method achieves significant performance improvements on all scenarios. The recent advancement of multi-sensor technologies and algorithms has boosted significant progress to human action recognition systems, especially for dealing with realistic scenarios. However, partial occlusion, as a major obstacle in real-world applications, has not received sufficient attention in the action recognition community. In this paper, we extensively investigate how occlusion can be addressed by multi-view fusion. Specifically, we propose a robust representation called local nearest neighbour embedding (LNNE). We then extend the LNNE method to 3 multi-view fusion scenarios. Additionally, we provide detailed analysis of the proposed voting strategy from the boosting point of view. We evaluate our approach on both synthetic and realistic occluded databases, and the LNNE method outperforms the state-of-the-art approaches in all tested scenarios.

[1]  Anni Cai,et al.  Multi-camera recognition of people operating home medical devices , 2010, 2010 3rd International Conference on Biomedical Engineering and Informatics.

[2]  Ling Shao,et al.  Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach , 2016, IEEE Transactions on Cybernetics.

[3]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[4]  Xuelong Li,et al.  General Tensor Discriminant Analysis and Gabor Features for Gait Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Qi Wang,et al.  Multi-spectral dataset and its application in saliency detection , 2013, Comput. Vis. Image Underst..

[6]  Junwei Han,et al.  Saliency detection by combining spatial and spectral information. , 2013, Optics letters.

[7]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[8]  David G. Lowe,et al.  Local Naive Bayes Nearest Neighbor for image classification , 2011, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Pascal Fua,et al.  Making Action Recognition Robust to Occlusions and Viewpoint Changes , 2010, ECCV.

[10]  Luc Van Gool,et al.  Naive Bayes Image Classification: Beyond Nearest Neighbors , 2012, ACCV.

[11]  Ling Shao,et al.  Embedding Motion and Structure Features for Action Recognition , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Limin Wang,et al.  Multi-view Super Vector for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  George Awad,et al.  Modelling and segmenting subunits for sign language recognition based on hand motion analysis , 2009, Pattern Recognit. Lett..

[14]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[15]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.

[16]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[17]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jean-Christophe Nebel,et al.  Are Current Monocular Computer Vision Systems for Human Action Recognition Suitable for Visual Surveillance Applications? , 2011, ISVC.

[19]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[20]  Pingkun Yan,et al.  Visual Saliency by Selective Contrast , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ling Shao,et al.  Multi-view action recognition using local similarity random forests and sensor fusion , 2013, Pattern Recognit. Lett..

[23]  Lei Guo,et al.  Object Detection in Optical Remote Sensing Images Based on Weakly Supervised Learning and High-Level Feature Learning , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[24]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Jungong Han,et al.  Efficient highlight removal of metal surfaces , 2014, Signal Process..

[26]  Wenming Zheng,et al.  Locally nearest neighbor classifiers for pattern classification , 2004, Pattern Recognit..

[27]  Xuelong Li,et al.  Saliency Detection by Multiple-Instance Learning , 2013, IEEE Transactions on Cybernetics.

[28]  Leslie Pack Kaelbling,et al.  Belief space planning assuming maximum likelihood observations , 2010, Robotics: Science and Systems.

[29]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[30]  Yong Luo,et al.  Decomposition-Based Transfer Distance Metric Learning for Image Classification , 2014, IEEE Transactions on Image Processing.

[31]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[32]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Dacheng Tao,et al.  Bregman Divergence-Based Regularization for Transfer Subspace Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[34]  François Brémond,et al.  Evaluation of Local Descriptors for Action Recognition in Videos , 2011, ICVS.

[35]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37]  Chen Wu,et al.  Multiview activity recognition in smart homes with spatio-temporal features , 2010, ICDSC '10.

[38]  Hwann-Tzong Chen,et al.  Local discriminant embedding and its variants , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Ling Shao,et al.  Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition , 2014, International Journal of Computer Vision.

[40]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[41]  Junwei Han,et al.  Object detection in remote sensing imagery using a discriminatively trained mixture model , 2013 .

[42]  Mubarak Shah,et al.  Incremental action recognition using feature-tree , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Dacheng Tao,et al.  Multi-View Intact Space Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ling Shao,et al.  Spatio-Temporal Laplacian Pyramid Coding for Action Recognition , 2014, IEEE Transactions on Cybernetics.