Body gesture classification based on Bag-of-features in frequency domain of motion

In this paper, we propose a method for semantic motion retrieval in large data sets of human motions to classify body gestures automatically. This method extracts spatio-temporal features from the motions by expressing them in frequency domain. And these features are transformed into the Bag-of-words representation to accelerate the calculation and to emphasize the semantic aspect. The method is inspired by techniques of natural language processing or image processing. We conducted experiments for evaluating the performance of the motion classification using data sets captured by a motion capture system. Through the experiments, we confirmed that our method improves the performance of the motion classification and reduces the computational time drastically.

[1]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Tetsuo Ono,et al.  Robovie: an interactive humanoid robot , 2001 .

[4]  A. Grossmann,et al.  Cycle-octave and related transforms in seismic signal analysis , 1984 .

[5]  Stefano Soatto,et al.  Flexible Dictionaries for Action Classification , 2008 .

[6]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Yoshihiko Nakamura,et al.  Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..

[9]  Frank E. Pollick,et al.  Motion and the uncanny valley , 2007 .

[10]  Tsukasa Ogasawara,et al.  Multi-person human-robot interaction system for android robot , 2010, 2010 IEEE/SICE International Symposium on System Integration.

[11]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[13]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[14]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[15]  Tsukasa Ogasawara,et al.  Smooth human-robot interaction by interruptible gesture planning , 2010, 2010 IEEE/ASME International Conference on Advanced Intelligent Mechatronics.

[16]  Michael Gleicher,et al.  Automated extraction and parameterization of motions in large data sets , 2004, SIGGRAPH 2004.

[17]  Lucas Kovar,et al.  Motion graphs , 2002, SIGGRAPH Classes.