WiGRUNT: WiFi-Enabled Gesture Recognition Using Dual-Attention Network

Gestures constitute an important form of nonverbal communication where bodily actions are used for delivering messages alone or in parallel with spoken words. Recently, there exists an emerging trend of WiFi sensing enabled gesture recognition due to its inherent merits like device-free, non-line-of-sight covering, and privacy-friendly. However, current WiFi-based approaches mainly reply on domain-specific training since they don't know ``\emph{where to look}'' and ``\emph{when to look}''. To this end, we propose WiGRUNT, a WiFi-enabled gesture recognition system using dual-attention network, to mimic how a keen human being intercepting a gesture regardless of the environment variations. The key insight is to train the network to dynamically focus on the domain-independent features of a gesture on the WiFi Channel State Information (CSI) via a spatial-temporal dual-attention mechanism. WiGRUNT roots in a Deep Residual Network (ResNet) backbone to evaluate the importance of spatial-temporal clues and exploit their inbuilt sequential correlations for fine-grained gesture recognition. We evaluate WiGRUNT on the open Widar3 dataset and show that it significantly outperforms its state-of-the-art rivals by achieving the best-ever performance in-domain or cross-domain.