Open-view human action recognition based on linear discriminant analysis

In the last decades, action recognition task has evolved from single view recording to unconstrained environment. Recently, multi-view action recognition has become a hot topic in computer vision. However, we notice that only a few works have focused on the open-view action recognition, which is a common problem in the real world. Open-view action recognition focus on doing action recognition in unseen view without using any information from it. To address this issue, we firstly introduce a novel multi-view surveillance action dataset and benchmark several state-of-the-art algorithms. From the results, we observe that the performance of the state-of-the-art algorithms would drop a lot under open-view constraints. Then, we propose a novel open-view action recognition method based on the linear discriminant analysis. This method can learn a common space for action samples under different view by using their category information, which can achieve a good result in open-view action recognition.

[1]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[3]  Bin Wang,et al.  Cross domain mitotic cell recognition , 2016, Neurocomputing.

[4]  Tat-Seng Chua,et al.  Learning from Collective Intelligence , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[5]  Qian Wang,et al.  Reconstruction and Application of Protein–Protein Interaction Network , 2016, International journal of molecular sciences.

[6]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[7]  Yi Yang,et al.  They are Not Equally Reliable: Semantic Event Search Using Differentiated Concept Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[9]  Xiaojun Chang,et al.  Semisupervised Feature Analysis by Mining Correlations Among Multiple Tasks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[11]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[12]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Meng Wang,et al.  Oracle in Image Search: A Content-Based Approach to Performance Prediction , 2012, TOIS.

[14]  Ming Gao,et al.  BiRank: Towards Ranking on Bipartite Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[15]  Tal Hassner,et al.  The Action Similarity Labeling Challenge , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[19]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[20]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[21]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[22]  Thomas Serre,et al.  The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[24]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yuting Su,et al.  Multiple/Single-View Human Action Recognition via Part-Induced Multitask Structural Learning , 2015, IEEE Transactions on Cybernetics.

[26]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[28]  NieWei-Zhi,et al.  Cross-view action recognition by cross-domain learning , 2016 .

[29]  Qinghua Zheng,et al.  An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition , 2018, IEEE Transactions on Cybernetics.

[30]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[31]  Yue Gao,et al.  Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval , 2016, IEEE Transactions on Image Processing.

[32]  Wenhui Li,et al.  Cross-view action recognition by cross-domain learning , 2016, Image Vis. Comput..

[33]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[34]  Conrad Sanderson,et al.  Log-Euclidean bag of words for human action recognition , 2014, IET Comput. Vis..

[35]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Xuelong Li,et al.  Modeling Disease Progression via Multisource Multitask Learners: A Case Study With Alzheimer’s Disease , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[39]  Mohan S. Kankanhalli,et al.  Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition , 2017, IEEE Transactions on Cybernetics.

[40]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[41]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[42]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[44]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[45]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Hossein Ragheb,et al.  MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[47]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.