论文信息 - PRESS/HOLD/RELEASE Ultrasonic Gestures and Low Complexity Recognition Based on TCN

PRESS/HOLD/RELEASE Ultrasonic Gestures and Low Complexity Recognition Based on TCN

Targeting ultrasound-based gesture recognition, this paper proposes a new universal PRESS/HOLD/RELEASE approach that leverages the diversity of gestures performed on smart devices such as mobile phones and IoT nodes. The new set of gestures are generated by interleaving PRESS/HOLD/RELEASE patterns; abbreviated as P/H/R, with gestures like sweeps between a number of microphones. P/H/R patterns are constructed by a hand as it approaches a top of a microphone to generate a virtual Press. After that, the hand settles for an undefined period of time to generate a virtual Hold and finally departs to generate a virtual Release. The same hand can sweep to a 2nd microphone and perform another P/H/R. Interleaving the P/H/R patterns expands the number of performed gestures. Assuming an on-board speaker transmitting ultrasonic signals, the detection is performed on Doppler shift readings generated by a hand as it approaches and departs a top of a microphone. The Doppler shift readings are presented in a sequence of down-mixed ultrasonic spectrogram frames. We train a Temporal Convolutional Network (TCN) to classify the P/H/R patterns under different environmental noises. Our experimental results show that such P/H/R patterns at a top of a microphone can be achieved with 96.6% accuracy under different noise conditions. A group of P/H/R based gestures has been tested on commercially off-the-shelf (COTS) Samsung Galaxy S7 Edge. Different P/H/R interleaved gestures (such as sweeps, long taps, etc.) are designed using two microphones and a single speaker while using as low as $\sim 5\mathrm{K}$ parameters and as low as $\sim 0.15$ Million operations (MOPs) in compute power per inference. The P/H/R interleaved set of gestures are intuitive and hence are easy to learn by end users. This paves its way to be deployed by smartphones and smart speakers for mass production.

Min Li | José Pineda de Gyvez | Emad A. Ibrahim

[1] Lei Yang,et al. AudioGest: enabling fine-grained hand gesture detection by decoding echo signal , 2016, UbiComp.

[2] Min Li,et al. In-Air Ultrasonic 3D-Touchscreen with Gesture Recognition Using Existing Hardware for Smart Devices , 2016, 2016 IEEE International Workshop on Signal Processing Systems (SiPS).

[3] Eric C. Larson,et al. DopLink: using the doppler effect for multi-device interaction , 2013, UbiComp.

[4] Evangelos Kalogerakis,et al. RisQ: recognizing smoking gestures with inertial sensors on a wristband , 2014, MobiSys.

[5] Haipeng Dai,et al. UltraGesture: Fine-Grained Gesture Sensing and Recognition , 2018, 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[6] Mengyun Liu,et al. High precision gesture sensing via quantitative characterization of the Doppler effect , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[7] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Vladlen Koltun,et al. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10] Dina Katabi,et al. RF-IDraw: virtual touch screen in the air using RF signals , 2014, S3 '14.

[11] Tareq Y. Al-Naffouri,et al. Angle-of-arrival-based gesture recognition using ultrasonic multi-frequency signals , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[12] Desney S. Tan,et al. SoundWave: using the doppler effect to sense gestures , 2012, CHI.