论文信息 - BoVDW: Bag-of-Visual-and-Depth-Words for gesture recognition

BoVDW: Bag-of-Visual-and-Depth-Words for gesture recognition

We present a Bag-of-Visual-and-Depth-Words (BoVDW) model for gesture recognition, an extension of the Bag-of-Visual-Words (BoVW) model, that benefits from the multimodal fusion of visual and depth features. State-of-the-art RGB and depth features, including a new proposed depth descriptor, are analysed and combined in a late fusion fashion. The method is integrated in a continuous gesture recognition pipeline, where Dynamic Time Warping (DTW) algorithm is used to perform prior segmentation of gestures. Results of the method in public data sets, within our gesture recognition pipeline, show better performance in comparison to a standard BoVW model.

[1] Sergio Escalera,et al. Contextual-Guided Bag-of-Visual-Words Model for Multi-class Object Categorization , 2009, CAIP.

[2] P. Duygulu,et al. Visual categorization with bags of keypoints , 2002, eccv 2002.

[3] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[4] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[5] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories , 2006 .

[6] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[7] Fei-FeiLi,et al. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008 .

[8] Nico Blodow,et al. Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[9] David D. Lewis,et al. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[10] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.