论文信息 - FlickerNet: Adaptive 3D Gesture Recognition from Sparse Point Clouds

FlickerNet: Adaptive 3D Gesture Recognition from Sparse Point Clouds

Recent studies on gesture recognition use deep convolutional neural networks (CNNs) to extract spatio-temporal features from individual frames or short video clips. However, extracting features frame-by-frame will bring a lot of redundant and ambiguous gesture information. Inspired by the flicker fusion phenomena, we propose a simple but efficient network, called FlickerNet, to recognize gesture from a sequence of sparse point clouds sampled from depth videos. Different from the existing CNN-based methods, FlickerNet can adaptively recognize hand postures and hand motions from the flicker of gestures: the point clouds of the stable hand postures and the sparse point-cloud motion for fast hand motions. Notably, FlickerNet significantly outperforms the previous state-of-the-art approaches on two challenging datasets with much higher computational efficiency.

[1] Richard Bowden,et al. Sign Language Recognition , 2011, Visual Analysis of Humans.

[2] Koichi Hashimoto,et al. Point Attention Network for Gesture Recognition Using Point Cloud Data , 2018, BMVC.

[3] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[4] Pavlo Molchanov,et al. Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Tao Mei,et al. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6] Junsong Yuan,et al. Point-to-Point Regression PointNet for 3D Hand Pose Estimation , 2018, ECCV.

[7] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Alex Pentland,et al. Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Jun Wan,et al. Large-Scale Isolated Gesture Recognition Using a Refined Fused Model Based on Masked Res-C3D Network and Skeleton LSTM , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[10] Vladimir Pavlovic,et al. Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11] Тараса Шевченка,et al. Quo vadis? , 2013, Clinical chemistry.

[12] Bharti Bansal,et al. Gesture Recognition: A Survey , 2016 .

[13] N. Otsu. A threshold selection method from gray level histograms , 1979 .

[14] Pavlo Molchanov,et al. Making Convolutional Networks Recurrent for Visual Sequence Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Junsong Yuan,et al. Hand PointNet: 3D Hand Pose Estimation Using Point Sets , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Xiaodong Yang,et al. Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[20] Sergio Escalera,et al. ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[22] Mohan M. Trivedi,et al. Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[23] William T. Freeman,et al. Orientation Histograms for Hand Gesture Recognition , 1995 .

[24] Xilin Chen,et al. Two streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[25] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Lu Yang,et al. Survey on 3D Hand Gesture Recognition , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[27] Bruce A. Draper,et al. Gesture Recognition: Focus on the Hands , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[29] Andrea Vedaldi,et al. Dynamic Image Networks for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Pichao Wang,et al. Large-Scale Multimodal Gesture Recognition Using Heterogeneous Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[31] Xilin Chen,et al. Isolated Sign Language Recognition with Grassmann Covariance Matrices , 2016, TACC.

[32] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33] Koichi Hashimoto,et al. Spatiotemporal Learning of Dynamic Gestures from 3D Point Cloud Data , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[34] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[35] E. Simonson,et al. Flicker fusion frequency; background and applications. , 1952, Physiological reviews.

[36] Xin Xu,et al. Multimodal Gesture Recognition Based on the ResC3D Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[37] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.