Intelligent human hand gesture recognition by local-global fusing quality-aware features

Abstract Each normal human has pairwise hands, which can make sophisticated hand gestures delivering different semantic meanings. In practice, disabled persons can communicate with each other by their hand gestures. However, for normal people without specific training process, it is challenging for them to understand the meaning of the various hand gestures. In this work, we propose a novel quality-aware human–computer interaction (HCI) framework for understanding the sophisticated human hand gestures, wherein the key technique is a multi-view feature learning algorithm that optimally fuses hand silhouettes, different figure positions, and hand motions. More specifically, given each human hand, we first extract the multimodal visual features, such as the hand silhouettes based on background removal, different figure position localization using active learning, and hand motions by leveraging optical flow. Afterward, we propose a multi-view learning-based feature fusing scheme that optimizes the multimodal features both locally and globally. Based on this, the optimal weights of different features channels can be calculated. By leveraging the optimally fused feature, we train a multi-class kernel machine to classify the various human hand gestures into each category with a particular meaning. Comprehensive experimental results on a large-scale human hand gesture data set have demonstrated that our method can achieve a highly competitive recognition accuracy by spending less time consumption. Besides, our method can robustly support arbitrary number of semantic category characterizing human hand gestures. Last but not least, our proposed hand gesture understanding technique can be conveniently incorporated into many state-of-the-art HCI system.

[1]  Joachim M. Buhmann,et al.  Multi-assignment clustering for Boolean data , 2009, ICML '09.

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Philip S. Yu,et al.  Hierarchical, Parameter-Free Community Discovery , 2008, ECML/PKDD.

[4]  Qinghua Hu,et al.  Generalized Latent Multi-View Subspace Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Steve Gregory,et al.  A Fast Algorithm to Find Overlapping Communities in Networks , 2008, ECML/PKDD.

[8]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[9]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[11]  Zhao Zhang,et al.  Multi-view clustering via spectral embedding fusion , 2018, Soft Comput..

[12]  Songcan Chen,et al.  Multi-view kernel machine on single-view data , 2009, Neurocomputing.

[13]  Saturnino Maldonado-Bascón,et al.  Recognizing in the depth: Selective 3D Spatial Pyramid Matching Kernel for object and scene categorization , 2014, Image Vis. Comput..

[14]  Ariel Shamir,et al.  Improved seam carving for video retargeting , 2008, SIGGRAPH 2008.