Action analysis and video summarisation to efficiently manage and interpret video data

In the last few years we have seen how the volume of video data has exponentially grown. Specialised online sites like YouTube and NetFlix are attracting a considerable amount of audience who are uploading, accessing, and actively interacting with the online sites. Furthermore, millions of video surveillance cameras have been installed around the world. Video cameras are installed to monitor shopping centres, universities, parks, streets, and in general to monitor any public place. Undoubtedly, it is becoming indispensable to efficiently and automatically manage and interpret all the massive amount of video data available nowadays. Computer vision is the science responsible for processing images and videos. The main goal of this thesis is to contribute towards efficiently managing and interpreting video data via action analysis and video summarisation. Action analysis using computer vision techniques is essential given that the majority of the available videos contain human actions. Action analysis is a broad topic that covers several areas. For instance, we can find: action recognition, joint action segmentation and recognition, and action assessment. For the action recognition problem, there are several techniques designed to recognise actions. Among them, two schools of thoughts have gained attention recently. On one hand, traditional video encoders and its variants are the main reference for action recognition. Traditional video encoders include the popular Bag of Visual Words and the Fisher Vector representation. On the other hand, statistical modelling of actions via Riemannian manifolds offers an interesting alternative to traditional video encoders. To this end, we provide a detailed analysis of the performance of the two aforementioned schools of thoughts for action recognition under same set of features across several datasets. The detailed analysis also investigates when these methods break and how performance degrades when the datasets have challenging conditions, likely to be encountered in uncontrolled situations. To address the joint action segmentation and recognition problem, we propose two hierarchical systems where a given video is processed as a sequence of overlapping temporal windows. Both proposed systems require fewer parameters to be optimised and avoid the need for a custom dynamic programming definition as in previous works. The last action analysis problem this thesis focuses on is action assessment. Action assessment is still in early stages. Action assessment consists in assessing how well people perform actions. Learning how to automatically assess actions can be a valuable tool. For instance, catwalk competitions require human assessment which may be highly subjective. However, to date, nobody has attempted to apply computer vision techniques to automatically assess the quality of how someone strides down the catwalk. Action analysis is not the only way to process video information. Video summarisation is an active area of research within the computer vision community. Instead of tedious manual review of hours and hours of video, video summarisation aims to provide a concise and informative summary of the video. We present a novel approach to video summarisation that makes use of a Bag-of-visualTextures approach which is computationally efficient and effective. Our approach can be used for short-term and long-term videos. On long-term videos the proposed system considerably reduces the amount of footage with only minor degradation in the information content.

[1]  Martial Hebert,et al.  Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Mehrtash Harandi,et al.  Graph-Embedding Discriminant Analysis on Riemannian Manifolds for Visual Recognition , 2013 .

[3]  Massimo Piccardi,et al.  Joint Action Segmentation and Classification by an Extended Hidden Markov Model , 2013, IEEE Signal Processing Letters.

[4]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[5]  Pavan K. Turaga,et al.  Dynamical Regularity for Action Analysis , 2015, BMVC.

[6]  Yunde Jia,et al.  Manifold Kernel Sparse Representation of Symmetric Positive-Definite Matrices and Its Applications , 2015, IEEE Transactions on Image Processing.

[7]  Luc Van Gool,et al.  European conference on computer vision (ECCV) , 2006, eccv 2006.

[8]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[9]  Shiguang Shan,et al.  Learning Mid-level Words on Riemannian Manifold for Action Recognition , 2015, ArXiv.

[10]  Samy Bengio,et al.  Learning semantic relationships for better action retrieval in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[12]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[13]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Jenq-Neng Hwang,et al.  A Review on Video-Based Human Activity Recognition , 2013, Comput..

[15]  Ling Shao,et al.  Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Tal Hassner,et al.  Motion Interchange Patterns for Action Recognition in Unconstrained Videos , 2012, ECCV.

[17]  Yi Yang,et al.  Resource Constrained Multimedia Event Detection , 2014, MMM.

[18]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[21]  Lei Wang,et al.  Learning Discriminative Stein Kernel for SPD Matrices and Its Applications , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[22]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23]  Muhammad Shakir,et al.  Video Summarization: Techniques and Classification , 2012, ICCVG.

[24]  Majid Mirmehdi,et al.  A comparative study of pose representation and dynamics modelling for online motion quality assessment , 2016, Comput. Vis. Image Underst..

[25]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[26]  Tie-Yan Liu,et al.  Ranking Measures and Loss Functions in Learning to Rank , 2009, NIPS.

[27]  Brian C. Lovell,et al.  Kernel analysis on Grassmann manifolds for action recognition , 2013, Pattern Recognit. Lett..

[28]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[29]  Janusz Konrad,et al.  Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels , 2010, ICPR Contests.

[30]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[31]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[32]  Mehrtash Tafazzoli Harandi,et al.  More about VLAD: A leap from Euclidean to Riemannian manifolds , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Brian C. Lovell,et al.  Efficient clustering on Riemannian manifolds: A kernelised random projection approach , 2015, Pattern Recognit..

[34]  Zicheng Liu,et al.  Action detection using multiple spatial-temporal interest point features , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[35]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jurandy Almeida,et al.  VISON: VIdeo Summarization for ONline applications , 2012, Pattern Recognit. Lett..

[37]  Jung Hwan Oh,et al.  Video Abstraction , 2009, Encyclopedia of Database Systems.

[38]  Brian C. Lovell,et al.  Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference , 2009, ICB.

[39]  Hema Swetha Koppula,et al.  Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation , 2013, ICML.

[40]  Limin Wang,et al.  A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition , 2012, ACCV.

[41]  Wassim M. Haddad,et al.  Gait Assessment for Multiple Sclerosis Patients Using Microsoft Kinect , 2015, ArXiv.

[42]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[43]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[44]  Bernt Schiele,et al.  Fine-Grained Activity Recognition with Holistic and Pose Based Features , 2014, GCPR.

[45]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[46]  Tal Hassner,et al.  A Critical Review of Action Recognition Benchmarks , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[47]  Ajay Divakaran,et al.  Video summarization using motion descriptors , 2001, IS&T/SPIE Electronic Imaging.

[48]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[49]  Li Wang,et al.  Discriminative human action segmentation and recognition using semi-Markov model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Gabriela Csurka,et al.  Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations , 2010, VISIGRAPP.

[51]  Wei Niu,et al.  Human activity detection and recognition for video surveillance , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[52]  Sharath Pankanti,et al.  Temporal Sequence Modeling for Video Event Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[54]  Young-Koo Lee,et al.  A Unified Framework for Activity Recognition-Based Behavior Analysis and Action Prediction in Smart Homes , 2013, Sensors.

[55]  Issa Traore,et al.  Continuous Authentication Using Biometrics: Data, Models, and Metrics , 2011 .

[56]  Brian C. Lovell,et al.  Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions , 2016, PAKDD Workshops.

[57]  Yui Man Lui,et al.  Human gesture recognition on product manifolds , 2012, J. Mach. Learn. Res..

[58]  Conrad Sanderson,et al.  Summarisation of short-term and long-term videos using texture and colour , 2014, IEEE Winter Conference on Applications of Computer Vision.

[59]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[60]  Joan L. Mitchell,et al.  JPEG: Still Image Data Compression Standard , 1992 .

[61]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[63]  Rama Chellappa,et al.  Nearest-neighbor search algorithms on non-Euclidean manifolds for computer vision applications , 2010, ICVGIP '10.

[64]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Rama Chellappa,et al.  Kernel Learning for Extrinsic Classification of Manifold Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[68]  Amir Roshan Zamir,et al.  Action Recognition in Realistic Sports Videos , 2014 .

[69]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[70]  Brian C. Lovell,et al.  Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors , 2016, PAKDD Workshops.

[71]  Brian C. Lovell,et al.  Towards Miss Universe automatic prediction: The evening gown competition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[72]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.

[73]  Chih-Jen Lin,et al.  Large-Scale Linear RankSVM , 2014, Neural Computation.

[74]  Irfan A. Essa,et al.  Automated Assessment of Surgical Skills Using Frequency Analysis , 2015, MICCAI.

[75]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[76]  Zhe-Ming Lu,et al.  Video abstraction based on the visual attention model and online clustering , 2013, Signal Process. Image Commun..

[77]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[78]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[79]  A. P. Dawid,et al.  Generative or Discriminative? Getting the Best of Both Worlds , 2007 .

[80]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Yelena Yesha,et al.  Keyframe-based video summarization using Delaunay clustering , 2006, International Journal on Digital Libraries.

[82]  Mohsen Ramezani,et al.  A review on human action analysis in videos for retrieval applications , 2016, Artificial Intelligence Review.

[83]  Cordelia Schmid,et al.  Combining attributes and Fisher vectors for efficient image retrieval , 2011, CVPR 2011.

[84]  Douglas A. Reynolds,et al.  Gaussian Mixture Models , 2018, Encyclopedia of Biometrics.

[85]  Brian C. Lovell,et al.  Multi-Action Recognition via Stochastic Modelling of Optical Flow and Gradients , 2014, MLSDA'14.

[86]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[87]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[88]  Sergios Theodoridis,et al.  Chapter 2 – Classifiers Based on Bayes Decision Theory , 2006 .

[89]  Nassir Navab,et al.  Extended Co-occurrence HOG with Dense Trajectories for Fine-Grained Activity Recognition , 2014, ACCV.

[90]  Massimo Piccardi,et al.  Comparison of Classifiers for Human Activity Recognition , 2007, IWINAC.

[91]  Andrew Zisserman,et al.  A Compact and Discriminative Face Track Descriptor , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[92]  Roland Göcke,et al.  Ordered Trajectories for Large Scale Human Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[93]  Conrad Sanderson,et al.  Relational divergence based classification on Riemannian manifolds , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[94]  James M. Keller,et al.  Toward a Passive Low-Cost In-Home Gait Assessment System for Older Adults , 2013, IEEE Journal of Biomedical and Health Informatics.

[95]  J. Ross Beveridge,et al.  Tangent bundle for human action recognition , 2011, Face and Gesture 2011.

[96]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[97]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[98]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[99]  Limin Wang,et al.  Mining Motion Atoms and Phrases for Complex Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[100]  Masamichi Shimosaka,et al.  Robust Action Recognition and Segmentation with Multi-Task Conditional Random Fields , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[101]  Dario Bini,et al.  Computing the Karcher mean of symmetric positive definite matrices , 2013 .

[102]  J. Aggarwal,et al.  Recognizing human action from a far field of view , 2009, 2009 Workshop on Motion and Video Computing (WMVC).

[103]  Bruce A. Draper,et al.  Scalable action recognition with a subspace forest , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[105]  S. Sathiya Keerthi,et al.  Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.

[106]  Larry S. Davis,et al.  Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[107]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[108]  Tao Mei,et al.  Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation , 2016, ICMR.

[109]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[110]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[111]  Thomas L. Griffiths,et al.  Segmenting and Recognizing Human Action using Low-level Video Features , 2011, CogSci.

[112]  Jean Meunier,et al.  Gait Analysis from Video: Camcorders vs. Kinect , 2014, ICIAR.

[113]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[114]  Janusz Konrad,et al.  Action Recognition From Video Using Feature Covariance Matrices , 2013, IEEE Transactions on Image Processing.

[115]  Zhe L. Lin,et al.  A Local Bag-of-Features Model for Large-Scale Object Retrieval , 2010, ECCV.

[116]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[117]  Satoshi Hirose,et al.  An empirical solution for over-pruning with a novel ensemble-learning method for fMRI decoding , 2015, Journal of Neuroscience Methods.

[118]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[119]  Brian C. Lovell,et al.  Spatio-temporal covariance descriptors for action and gesture recognition , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[120]  Jessica K. Hodgins,et al.  Detailed Human Data Acquisition of Kitchen Activities: the CMU-Multimodal Activity Database (CMU-MMAC) , 2008 .

[121]  Ling Shao,et al.  Human action segmentation and recognition via motion and shape analysis , 2012, Pattern Recognit. Lett..

[122]  Brian C. Lovell,et al.  Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[123]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[124]  Bingbing Ni,et al.  Progressively Parsing Interactional Objects for Fine Grained Action Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[125]  Junbin Gao,et al.  Low Rank Representation on Grassmann Manifolds , 2014, ACCV.

[126]  Xavier Amatriain,et al.  Data Mining Methods for Recommender Systems , 2011, Recommender Systems Handbook.

[127]  Conrad Sanderson,et al.  Armadillo: a template-based C++ library for linear algebra , 2016, J. Open Source Softw..

[128]  Ashish Khare,et al.  Rule based human activity recognition for surveillance system , 2012, 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI).

[129]  Alexander C. Berg,et al.  Hipster Wars: Discovering Elements of Fashion Styles , 2014, ECCV.

[130]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[131]  J. Novak-Marcincin,et al.  Devices and software possibilities for using of motion tracking systems in the virtual reality system , 2012, 2012 IEEE 10th International Symposium on Applied Machine Intelligence and Informatics (SAMI).

[132]  Cordelia Schmid,et al.  Actom sequence models for efficient action detection , 2011, CVPR 2011.

[133]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[134]  R. K. Agrawal,et al.  First and Second Order Statistics Features for Classification of Magnetic Resonance Brain Images , 2012 .

[135]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[136]  Massimo Piccardi,et al.  Joint action recognition and summarization by sub-modular inference , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[137]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[138]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[139]  Hongdong Li,et al.  Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[140]  Brian C. Lovell,et al.  Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution , 2013, 2013 IEEE International Conference on Computer Vision.

[141]  Brian C. Lovell,et al.  Object tracking via non-Euclidean geometry: A Grassmann approach , 2014, IEEE Winter Conference on Applications of Computer Vision.

[142]  Yui Man Lui,et al.  Tangent Bundles on Special Manifolds for Action Recognition , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[143]  Larry S. Davis,et al.  Action recognition using ballistic dynamics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[144]  Ling Shao,et al.  Human Action Retrieval via efficient feature matching , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[145]  Martin K. Purvis,et al.  Key-frame extraction of wildlife video based on semantic context modeling , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[146]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[147]  Cisco Visual Networking Index: Forecast and Methodology 2016-2021.(2017) http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/complete-white-paper-c11-481360.html. High Efficiency Video Coding (HEVC) Algorithms and Architectures https://jvet.hhi.fraunhofer. , 2017 .

[148]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[149]  Gérard G. Medioni,et al.  Structured Time Series Analysis for Human Action Segmentation and Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[150]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[151]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[152]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[153]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[154]  Radha Poovendran,et al.  Human activity recognition for video surveillance , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[155]  Huafeng Chen,et al.  Global Contrast Based Salient Region Boundary Sampling for Action Recognition , 2016, MMM.

[156]  M. Tarr,et al.  Visual Object Recognition , 1996, ISTCS.

[157]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[158]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[159]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[160]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[161]  Rama Chellappa,et al.  Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[162]  Jianxin Wu,et al.  Towards Good Practices for Action Video Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[163]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[164]  Yu Qiao,et al.  Action Recognition with Stacked Fisher Vectors , 2014, ECCV.

[165]  Pavel Zemcík,et al.  Deep learning on small datasets using online image search , 2016, SCCG.

[166]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[167]  Hongdong Li,et al.  Optimizing over Radial Kernels on Compact Manifolds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[168]  Rodney A. Walker,et al.  Color and texture feature fusion using kernel PCA with application to object-based vegetation species classification , 2010, 2010 IEEE International Conference on Image Processing.

[169]  Brian C. Lovell,et al.  Sparse Coding and Dictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach , 2012, ECCV.

[170]  Limin Wang,et al.  Latent Hierarchical Model of Temporal Structure for Complex Activity Classification , 2014, IEEE Transactions on Image Processing.

[171]  Brian C. Lovell,et al.  Clustering on Grassmann manifolds via kernel embedding with application to action analysis , 2012, 2012 19th IEEE International Conference on Image Processing.

[172]  Jian-quan Ouyang,et al.  Ontology reasoning scheme for constructing meaningful sports video summarisation , 2013, IET Image Process..

[173]  Bernt Schiele,et al.  A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[174]  M. Narasimha Murty,et al.  Nearest Neighbour Based Classifiers , 2011 .

[175]  Nicolas Pérez de la Blanca,et al.  HMM-Based Action Recognition Using Contour Histograms , 2007, IbPRIA.

[176]  Saeid Nahavandi,et al.  A Review of Vision-Based Gait Recognition Methods for Human Identification , 2010, 2010 International Conference on Digital Image Computing: Techniques and Applications.

[177]  Brian C. Lovell,et al.  Improved Foreground Detection via Block-Based Classifier Cascade With Probabilistic Decision Integration , 2013, IEEE Transactions on Circuits and Systems for Video Technology.