On-line Action Recognition from Sparse Feature Flow

The fast and robust recognition of human actions is an important aspect for many video-based applications in the field of human computer interaction and surveillance. Although current recognition algorithms provide more and more advanced results, their usability for on-line applications is still limited. To bridge this gap a online video-based action recognition system is presented that combines histograms of sparse feature point flow with an HMM-based action recognition. The usage of feature point motion is computational more efficient than the more common histograms of optical flow (HoF) by reaching a similar recognition accuracy. For recognition we use low-level action units that are modeled by Hidden-Markov-Models (HMM). They are assembled by a context free grammar to recognize complex activities. The concatenation of small action units to higher level tasks allows the robust recognition of action sequences as well as a continuous on-line evaluation of the ongoing activity. The average runtime is around 34 ms for processing one frame and around 20 ms for calculating one hypothesis for the current action. Assuming that one hypothesis per second is needed, the system can provide a mean capacity of 25 fps. The systems accuracy is compared with state of the art recognition results on a common benchmark dataset as well as with a marker-based recognition system, showing similar results for the given evaluation scenario. The presented approach can be seen as a step towards the on-line evaluation and recognition of human motion directly from video data.

[1]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[4]  Manuel J. Marín-Jiménez,et al.  Human Action Recognition Using Optical Flow Accumulated Local Histograms , 2009, IbPRIA.

[5]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[7]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[8]  Klaus Ries,et al.  The Karlsruhe-Verbmobil speech recognition engine , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Manuel J. Marín-Jiménez,et al.  Fitting Product of HMM to Human Motions , 2009, CAIP.

[10]  Somayeh Danafar,et al.  Action Recognition for Surveillance Applications Using Optic Flow and SVM , 2007, ACCV.

[11]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Tanja Schultz,et al.  HMM-based human motion recognition with optical flow data , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.