PRESS/HOLD/RELEASE Ultrasonic Gestures and Low Complexity Recognition Based on TCN

Targeting ultrasound-based gesture recognition, this paper proposes a new universal PRESS/HOLD/RELEASE approach that leverages the diversity of gestures performed on smart devices such as mobile phones and IoT nodes. The new set of gestures are generated by interleaving PRESS/HOLD/RELEASE patterns; abbreviated as P/H/R, with gestures like sweeps between a number of microphones. P/H/R patterns are constructed by a hand as it approaches a top of a microphone to generate a virtual Press. After that, the hand settles for an undefined period of time to generate a virtual Hold and finally departs to generate a virtual Release. The same hand can sweep to a 2nd microphone and perform another P/H/R. Interleaving the P/H/R patterns expands the number of performed gestures. Assuming an on-board speaker transmitting ultrasonic signals, the detection is performed on Doppler shift readings generated by a hand as it approaches and departs a top of a microphone. The Doppler shift readings are presented in a sequence of down-mixed ultrasonic spectrogram frames. We train a Temporal Convolutional Network (TCN) to classify the P/H/R patterns under different environmental noises. Our experimental results show that such P/H/R patterns at a top of a microphone can be achieved with 96.6% accuracy under different noise conditions. A group of P/H/R based gestures has been tested on commercially off-the-shelf (COTS) Samsung Galaxy S7 Edge. Different P/H/R interleaved gestures (such as sweeps, long taps, etc.) are designed using two microphones and a single speaker while using as low as $\sim 5\mathrm{K}$ parameters and as low as $\sim 0.15$ Million operations (MOPs) in compute power per inference. The P/H/R interleaved set of gestures are intuitive and hence are easy to learn by end users. This paves its way to be deployed by smartphones and smart speakers for mass production.

[1]  Lei Yang,et al.  AudioGest: enabling fine-grained hand gesture detection by decoding echo signal , 2016, UbiComp.

[2]  Min Li,et al.  In-Air Ultrasonic 3D-Touchscreen with Gesture Recognition Using Existing Hardware for Smart Devices , 2016, 2016 IEEE International Workshop on Signal Processing Systems (SiPS).

[3]  Eric C. Larson,et al.  DopLink: using the doppler effect for multi-device interaction , 2013, UbiComp.

[4]  Evangelos Kalogerakis,et al.  RisQ: recognizing smoking gestures with inertial sensors on a wristband , 2014, MobiSys.

[5]  Haipeng Dai,et al.  UltraGesture: Fine-Grained Gesture Sensing and Recognition , 2018, 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[6]  Mengyun Liu,et al.  High precision gesture sensing via quantitative characterization of the Doppler effect , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[7]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Dina Katabi,et al.  RF-IDraw: virtual touch screen in the air using RF signals , 2014, S3 '14.

[11]  Tareq Y. Al-Naffouri,et al.  Angle-of-arrival-based gesture recognition using ultrasonic multi-frequency signals , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[12]  Desney S. Tan,et al.  SoundWave: using the doppler effect to sense gestures , 2012, CHI.