RadioMic: Sound Sensing via mmWave Signals

Voice interfaces has become an integral part of our lives, with the proliferation of smart devices. Today, IoT devices mainly rely on microphones to sense sound. Microphones, however, have fundamental limitations, such as weak source separation, limited range in the presence of acoustic insulation, and being prone to multiple side-channel attacks. In this paper, we propose RadioMic, a radio-based sound sensing system to mitigate these issues and enrich sound applications. RadioMic constructs sound based on tiny vibrations on active sources (e.g., a speaker or human throat) or object surfaces (e.g., paper bag), and can work through walls, even a soundproof one. To convert the extremely weak sound vibration in the radio signals into sound signals, RadioMic introduces radio acoustics, and presents training-free approaches for robust sound detection and high-fidelity sound recovery. It then exploits a neural network to further enhance the recovered sound by expanding the recoverable frequencies and reducing the noises. RadioMic translates massive online audios to synthesized data to train the network, and thus minimizes the need of RF data. We thoroughly evaluate RadioMic under different scenarios using a commodity mmWave radar. The results show RadioMic outperforms the state-of-the-art systems significantly. We believe RadioMic provides new horizons for sound sensing and inspires attractive sensing capabilities of mmWave sensing devices.

[1]  Daegun Oh,et al.  Through-Wall Remote Human Voice Recognition Using Doppler Radar With Transfer Learning , 2019, IEEE Sensors Journal.

[2]  Peter Birkholz,et al.  Non-Invasive Silent Phoneme Recognition Using Microwave Signals , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  A.M. Eid,et al.  Ultrawideband Speech Sensing , 2009, IEEE Antennas and Wireless Propagation Letters.

[4]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Jun Han,et al.  Spying with your robot vacuum cleaner: eavesdropping via lidar sensors , 2020, SenSys.

[6]  Wei Wang,et al.  Understanding and Modeling of WiFi Signal Based Human Activity Recognition , 2015, MobiCom.

[7]  Romit Roy Choudhury,et al.  BackDoor: Making Microphones Hear Inaudible Sounds , 2017, MobiSys.

[8]  Dennis W. Ruck,et al.  Enhancing automatic speech recognition with an ultrasonic lip motion detector , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Bhiksha Raj,et al.  Ultrasonic Doppler sensor for speaker recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Sharanya Srinivas,et al.  UWB Radar Vibrometry : An RF Microphone , 2019, 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

[11]  Benjamin Engel Foundations Of Engineering Acoustics , 2016 .

[12]  Wei Wang,et al.  Device-free gesture tracking using acoustic signals , 2016, MobiCom.

[13]  Fillia Makedon,et al.  Audio-visual speech recognition incorporating facial depth information captured by the Kinect , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[14]  Nicholas W. D. Evans,et al.  Re-assessing the threat of replay spoofing attacks against automatic speaker verification , 2014, 2014 International Conference of the Biometrics Special Interest Group (BIOSIG).

[15]  Jinwon Lee,et al.  A Fully Convolutional Neural Network for Speech Enhancement , 2016, INTERSPEECH.

[16]  Xiaojiang Chen,et al.  WideSee: towards wide-area contactless wireless sensing , 2019, SenSys.

[17]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[18]  Meng Jin,et al.  mmVib: micrometer-level vibration measurement with mmwave radar , 2020, MobiCom.

[19]  Haizhou Li,et al.  A study on replay attack and anti-spoofing for text-dependent speaker verification , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[20]  Adi Shamir,et al.  Lamphone: Real-Time Passive Sound Recovery from Light Bulb Vibrations , 2020, IACR Cryptol. ePrint Arch..

[21]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[22]  Romit Roy Choudhury,et al.  Listening through a Vibration Motor , 2016, MobiSys.

[23]  Yang Zhang,et al.  Detection of the Vibration Signal from Human Vocal Folds Using a 94-GHz Millimeter-Wave Radar , 2017, Sensors.

[24]  K. J. Ray Liu,et al.  Wireless AI: Wireless Sensing, Positioning, IoT, and Communications , 2019 .

[25]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Lili Qiu,et al.  AIM: Acoustic Imaging on a Mobile , 2018, MobiSys.

[27]  Zhengxiong Li,et al.  WaveEar: Exploring a mmWave-based Noise-resistant Speech Sensing for Voice-User Interface , 2019, MobiSys.

[28]  Tim Fingscheidt,et al.  Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Sangki Yun,et al.  Turning a Mobile Device into a Mouse in the Air , 2015, MobiSys.

[30]  Zong-Wen Li Millimeter Wave Radar for detecting the speech signal applications , 1996 .

[31]  Sachin Katti,et al.  SpotFi: Decimeter Level Localization Using WiFi , 2015, SIGCOMM.

[32]  Jörg Widmer,et al.  Adaptive Codebook Optimization for Beam Training on Off-the-Shelf IEEE 802.11ad Devices , 2018, MobiCom.

[33]  Wenyuan Xu,et al.  DolphinAttack: Inaudible Voice Commands , 2017, CCS.

[34]  Ki-Seung Lee Speech enhancement using ultrasonic doppler sonar , 2019, Speech Commun..

[35]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[36]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[37]  Chin-Hui Lee,et al.  A deep neural network approach to speech bandwidth expansion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Shu Wang,et al.  Acoustic Eavesdropping through Wireless Vibrometry , 2015, MobiCom.

[40]  Andrew Gerald Stove,et al.  Linear FMCW radar techniques , 1992 .

[41]  Mani B. Srivastava,et al.  UWHear: through-wall extraction and separation of audio vibrations using wireless signals , 2020, SenSys.

[42]  Xiaohua Zhu,et al.  Time-Varying Vocal Folds Vibration Detection Using a 24 GHz Portable Auditory Radar , 2016, Sensors.

[43]  Gabi Nakibly,et al.  Gyrophone: Recognizing Speech from Gyroscope Signals , 2014, USENIX Security Symposium.

[44]  Félix Gontier,et al.  Bandwidth Extension of Musical Audio Signals With No Side Information Using Dilated Convolutional Neural Networks , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  Parth H. Pathak,et al.  AccelWord: Energy Efficient Hotword Detection through Accelerometer , 2015, MobiSys.

[46]  Antony William Rix,et al.  Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .

[47]  Chenglin Miao,et al.  Towards 3D human pose construction using wifi , 2020, MobiCom.

[48]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[49]  Pedro de Paco,et al.  Microwave Microphone Using a General Purpose 24-GHz FMCW Radar , 2020, IEEE Sensors Letters.

[50]  Chen Chen,et al.  The Promise of Radio Analytics: A Future Paradigm of Wireless Positioning, Tracking, and Sensing , 2018, IEEE Signal Processing Magazine.

[51]  Jie Xiong,et al.  Combating interference for long range LoRa sensing , 2020, SenSys.

[52]  Antonio Torralba,et al.  RF-based 3D skeletons , 2018, SIGCOMM.

[53]  Bhiksha Raj,et al.  Synthesizing speech from Doppler signals , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54]  Yongsen Ma,et al.  WiFi Sensing with Channel State Information , 2019, ACM Comput. Surv..

[55]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[56]  Yunhao Liu,et al.  Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi , 2019, MobiSys.