Online Action Recognition via Nonparametric Incremental Learning

We introduce an online action recognition system that can be combined with any set of frame-by-frame feature descriptors. Our system covers the frame feature space with classifiers whose distribution adapts to the hardness of locally approximating the Bayes optimal classifier. An efficient nearest neighbour search is used to find and combine the local classifiers that are closest to the frames of a new video to be classified. The advantages of our approach are: incremental training, frame by frame real-time prediction, nonparametric predictive modelling, video segmentation for continuous action recognition, no need to trim videos to equal lengths and only one tuning parameter (which, for large datasets, can be safely set to the diameter of the feature space). Experiments on standard benchmarks show that our system is competitive with state-of-the-art nonincremental and incremental baselines. keywords: action recognition, incremental learning, continuous action recognition, nonparametric model, real time, multivariate time series classification, temporal classification

[1]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[2]  Jonathan Karl Kies,et al.  Empirical methods for evaluating video-mediated collaborative work , 1998 .

[3]  Mineichi Kudo,et al.  Multidimensional curve classification using passing-through regions , 1999, Pattern Recognit. Lett..

[4]  Shaoning Pang,et al.  Incremental learning for online face recognition , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[5]  P. Olivier,et al.  Accuracy of the Microsoft Kinect sensor for measuring movement in people with Parkinson's disease. , 2014, Gait & posture.

[6]  Wei Liang,et al.  Incremental discriminant-analysis of canonical correlations for action recognition , 2010, Pattern Recognit..

[7]  Andrew Gilbert,et al.  Action Recognition Using Mined Hierarchical Compound Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[10]  Pramod Sharma,et al.  Unsupervised incremental learning for improved object detection in a video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[12]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[13]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Michael A. Goodrich,et al.  Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[15]  Junsong Yuan,et al.  Robust hand gesture recognition with kinect sensor , 2011, ACM Multimedia.

[16]  Alessandro Rozza,et al.  Minimum Neighbor Distance Estimators of Intrinsic Dimension , 2011, ECML/PKDD.

[17]  Ingo Steinwart,et al.  Support Vector Machines are Universally Consistent , 2002, J. Complex..

[18]  Stan Sclaroff,et al.  A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jake K. Aggarwal,et al.  Segmentation and recognition of continuous human activity , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[20]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Mubarak Shah,et al.  Incremental action recognition using feature-tree , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[23]  Quan Z. Sheng,et al.  Online human gesture recognition from motion data streams , 2013, ACM Multimedia.

[24]  Björn Stenger,et al.  Online multiple classifier boosting for object tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[25]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[26]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[27]  Luca Maria Gambardella,et al.  A simple and efficient approach for cooperative incremental learning in robot swarms , 2013, 2013 16th International Conference on Advanced Robotics (ICAR).

[28]  Q. M. Jonathan Wu,et al.  Incremental Learning in Human Action Recognition Based on Snippets , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Masamichi Shimosaka,et al.  Fast online action recognition with efficient structured boosting , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[30]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[31]  Ming-Hsuan Yang,et al.  Incremental Learning for Visual Tracking , 2004, NIPS.

[32]  Stéphane Lecoeuche,et al.  Application of an incremental SVM algorithm for on-line human recognition from video surveillance using texture and color features , 2014, Neurocomputing.

[33]  Sebastian Nowozin,et al.  Action Points: A Representation for Low-latency Online Human Action Recognition , 2012 .

[34]  Teddy Ko,et al.  A survey on behavior analysis in video surveillance for homeland security applications , 2008, 2008 37th IEEE Applied Imagery Pattern Recognition Workshop.

[35]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Lianwen Jin,et al.  A novel feature extraction method using Pyramid Histogram of Orientation Gradients for smile recognition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[37]  P. Fihl,et al.  View-invariant gesture recognition using 3D optical flow and harmonic motion context , 2010, Comput. Vis. Image Underst..

[38]  James J. Little,et al.  Incremental Learning for Video-Based Gait Recognition With LBP Flow , 2013, IEEE Transactions on Cybernetics.

[39]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Gregory D. Hager,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, CVPR.

[41]  Ling Guan,et al.  Continuous human activity recognition , 2004, ICARCV 2004 8th Control, Automation, Robotics and Vision Conference, 2004..

[42]  Giorgio Metta,et al.  Keep it simple and sparse: real-time action recognition , 2013, J. Mach. Learn. Res..

[43]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[44]  Alessandro Giusti,et al.  Robust classification of multivariate time series by imprecise hidden Markov models , 2015, Int. J. Approx. Reason..

[45]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[46]  Francesco Orabona,et al.  Regression-tree Tuning in a Streaming Setting , 2013, NIPS.