Detection of human actions from a single example

We present an algorithm for detecting human actions based upon a single given video example of such actions. The proposed method is unsupervised, does not require learning, segmentation, or motion estimation. The novel features employed in our method are based on space-time locally adaptive regression kernels. Our method is based on the dense computation of so-called space-time local regression kernels (i.e. local descriptors) from a query video, which measure the likeness of a voxel to its spatio-temporal surroundings. Salient features are then extracted from these descriptors using principal components analysis (PCA). These are efficiently compared against analogous features from the target video using a matrix generalization of the cosine similarity measure. The algorithm yields a scalar resemblance volume; each voxel indicating the like-lihood of similarity between the query video and all cubes in the target video. By employing non-parametric significance tests and non-maxima suppression, we accurately detect the presence and location of actions similar to the given query video. High performance is demonstrated on a challenging set of action data [8] indicating successful detection of multiple complex actions even in the presence of fast motions.

[1]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[2]  Maurice M. Tatsuoka,et al.  MULTIVARIATE ANALYSIS OF VARIANCE , 2014, Applied Univariate, Bivariate, and Multivariate Statistics.

[3]  Frederic Devernay A Non-Maxima Suppression Method for Edge Detection with Sub-Pixel Accuracy , 1995 .

[4]  Peyman Milanfar,et al.  Kernel Regression for Image Processing and Reconstruction , 2007, IEEE Transactions on Image Processing.

[5]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jean-Michel Morel,et al.  Nonlocal Image and Movie Denoising , 2008, International Journal of Computer Vision.

[7]  Eli Shechtman,et al.  Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them? , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Patrick Bouthemy,et al.  Space-Time Adaptation for Patch-Based Image Sequence Restoration , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Peyman Milanfar,et al.  Deblurring Using Regularized Locally Adaptive Kernel Regression , 2008, IEEE Transactions on Image Processing.

[10]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Michael Elad,et al.  Super-Resolution Without Explicit Subpixel Motion Estimation , 2009, IEEE Transactions on Image Processing.

[12]  Ming Liu,et al.  Hierarchical Space-Time Model Enabling Efficient Search for Human Actions , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Peyman Milanfar,et al.  Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.