Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum

This paper proposes a novel machine learning architecture, specifically designed for radio-frequency based gesture recognition. We focus on high-frequency (60]GHz), short-range radar based sensing, in particular Google's Soli sensor. The signal has unique properties such as resolving motion at a very fine level and allowing for segmentation in range and velocity spaces rather than image space. This enables recognition of new types of inputs but poses significant difficulties for the design of input recognition algorithms. The proposed algorithm is capable of detecting a rich set of dynamic gestures and can resolve small motions of fingers in fine detail. Our technique is based on an end-to-end trained combination of deep convolutional and recurrent neural networks. The algorithm achieves high recognition rates (avg 87%) on a challenging set of 11 dynamic gestures and generalizes well across 10 users. The proposed model runs on commodity hardware at 140 Hz (CPU only).

[1]  Thomas S. Huang,et al.  Gesture modeling and recognition using finite state machines , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[2]  Mathieu Le Goc,et al.  A low-cost transparent electric field sensor for 3d interaction on mobile devices , 2014, CHI.

[3]  Otmar Hilliges,et al.  In-air gestures around unmodified mobile devices , 2014, UIST.

[4]  Christian Wolf,et al.  Multi-scale Deep Learning for Gesture Detection and Localization , 2014, ECCV Workshops.

[5]  Steven A. Shafer,et al.  XWand: UI for intelligent spaces , 2003, CHI '03.

[6]  Shwetak N. Patel,et al.  SideSwipe: detecting in-air gestures around mobile devices using actual GSM signal , 2014, UIST.

[7]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[8]  Gang Hua,et al.  Dynamic hand gesture recognition: An exemplar-based approach from motion divergence fields , 2012, Image Vis. Comput..

[9]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[11]  Christopher J. Hansen,et al.  WiGiG: Multi-gigabit wireless communications in the 60 GHz band , 2011, IEEE Wireless Communications.

[12]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[13]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[14]  Patrick Schrempf,et al.  RadarCat: Radar Categorization for Input & Interaction , 2016, UIST.

[15]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Desney S. Tan,et al.  Enabling always-available input with muscle-computer interfaces , 2009, UIST '09.

[17]  Sungjae Hwang,et al.  MagGetz: customizable passive tangible controllers on and around conventional mobile devices , 2013, UIST.

[18]  Frédo Durand,et al.  Capturing the human figure through a wall , 2015, ACM Trans. Graph..

[19]  Desney S. Tan,et al.  Skinput: appropriating the body as an input surface , 2010, CHI.

[20]  Ivan Poupyrev,et al.  Touché: enhancing touch interaction on humans, screens, liquids, and everyday objects , 2012, CHI.

[21]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[22]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[23]  Alex Graves,et al.  Supervised Sequence Labelling , 2012 .

[24]  Otmar Hilliges,et al.  Type-hover-swipe in 96 bytes: a motion sensing mechanical keyboard , 2014, CHI.

[25]  Shwetak N. Patel,et al.  Gesture Recognition Using Wireless Signals , 2015, GETMBL.

[26]  Desney S. Tan,et al.  SoundWave: using the doppler effect to sense gestures , 2012, CHI.

[27]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Shwetak N. Patel,et al.  Whole-home gesture recognition using wireless signals , 2013, MobiCom.

[29]  Ivan Poupyrev,et al.  Soli , 2016, ACM Trans. Graph..

[30]  Michael Rohs,et al.  HoverFlow: expanding the design space of around-device interaction , 2009, Mobile HCI.

[31]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[32]  Patrick Olivier,et al.  Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor , 2012, UIST.

[33]  Shumin Zhai,et al.  The influence of muscle groups on performance of multiple degree-of-freedom input , 1996, CHI.

[34]  Desney S. Tan,et al.  Humantenna: using the body as an antenna for real-time whole-body interaction , 2012, CHI.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Sean White,et al.  uTrack: 3D input using two magnetic sensors , 2013, UIST.

[38]  Jun Rekimoto,et al.  GestureWrist and GesturePad: unobtrusive wearable interaction devices , 2001, Proceedings Fifth International Symposium on Wearable Computers.

[39]  Shahram Izadi,et al.  SideSight: multi-"touch" interaction around small devices , 2008, UIST '08.