Learning-based Practical Smartphone Eavesdropping with Built-in Accelerometer

Motion sensors on current smartphones have been exploited for audio eavesdropping due to their sensitivity to vibrations. However, this threat is considered low-risk because of two widely acknowledged limitations: First, unlike microphones, motion sensors can only pick up speech signals traveling through a solid medium. Thus, the only feasible setup reported previously is to use a smartphone gyroscope to eavesdrop on a loudspeaker placed on the same table. The second limitation comes from a common sense that these sensors can only pick up a narrow band (85-100Hz) of speech signals due to a sampling ceiling of 200Hz. In this paper, we revisit the threat of motion sensors to speech privacy and propose AccelEve, a new side-channel attack that employs a smartphone’s accelerometer to eavesdrop on the speaker in the same smartphone. Specifically, it utilizes the accelerometer measurements to recognize the speech emitted by the speaker and to reconstruct the corresponding audio signals. In contrast to previous works, our setup allows the speech signals to always produce strong responses in accelerometer measurements through the shared motherboard, which successfully addresses the first limitation and allows this kind of attacks to penetrate into real-life scenarios. Regarding the sampling rate limitation, contrary to the widely-held belief, we observe up to 500Hz sampling rates in recent smartphones, which almost covers the entire fundamental frequency band (85-255Hz) of adult speech. On top of these pivotal observations, we propose a novel deep learning based system that learns to recognize and reconstruct speech information from the spectrogram representation of acceleration signals. This system employs adaptive optimization on deep neural networks with skip connections using robust and generalizable losses to achieve robust recognition and reconstruction performance. Extensive evaluations demonstrate the effectiveness and high accuracy of our attack under various settings.

[1]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[2]  Parth H. Pathak,et al.  AccelWord: Energy Efficient Hotword Detection through Accelerometer , 2015, MobiSys.

[3]  Gabi Nakibly,et al.  Gyrophone: Recognizing Speech from Gyroscope Signals , 2014, USENIX Security Symposium.

[4]  Ronald J. Baken,et al.  Clinical measurement of speech and voice , 1987 .

[5]  Ingo R. Titze,et al.  Principles of voice production , 1994 .

[6]  David Yates,et al.  Design of a MEMS Capacitive Combdrive Accelerometer , 2011 .

[7]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Nitesh Saxena,et al.  Speechless: Analyzing the Threat to Speech Privacy from Smartphone Motion Sensors , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[9]  Ioannis Ch. Paschalidis,et al.  A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization , 2018, J. Mach. Learn. Res..

[10]  Hao Chen,et al.  TouchLogger: Inferring Keystrokes on Touch Screen from Smartphone Motion , 2011, HotSec.

[11]  Insup Lee,et al.  Injected and Delivered: Fabricating Implicit Control over Actuation Systems by Spoofing Inertial Sensors , 2018, USENIX Security Symposium.

[12]  Yongdae Kim,et al.  Rocking Drones with Intentional Sound Noise on Gyroscopic Sensors , 2015, USENIX Security Symposium.

[13]  Wenyuan Xu,et al.  AccelPrint: Imperfections of Accelerometers Make Smartphones Trackable , 2014, NDSS.

[14]  Jian Liu,et al.  VibWrite: Towards Finger-input Authentication on Ubiquitous Surfaces via Physical Vibration , 2017, CCS.

[15]  Zhengxiong Li,et al.  WaveEar: Exploring a mmWave-based Noise-resistant Speech Sensing for Voice-User Interface , 2019, MobiSys.

[16]  Romit Roy Choudhury,et al.  Tapprints: your finger taps have fingerprints , 2012, MobiSys '12.

[17]  Wenyuan Xu,et al.  WALNUT: Waging Doubt on the Integrity of MEMS Accelerometers with Acoustic Injection Attacks , 2017, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[18]  马文驹 漫谈科氏力(Coriolis force) , 2010 .

[19]  Oscar Mayora-Ibarra,et al.  Speech activity detection using accelerometer , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[20]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[21]  G.T. Flowers,et al.  On the Degradation of MEMS Gyroscope Performance in the Presence of High Power Acoustic Noise , 2007, 2007 IEEE International Symposium on Industrial Electronics.

[22]  Patrick Traynor,et al.  (sp)iPhone: decoding vibrations from nearby keyboards using mobile phone accelerometers , 2011, CCS '11.

[23]  Hiroya Fujisaki,et al.  Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing , 1983 .

[24]  Gabi Nakibly,et al.  Mobile Device Identification via Sensor Fingerprinting , 2014, ArXiv.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Sven Grawunder,et al.  Average Speaking Pitch vs. Average Speaker Fundamental Frequency - Reliability, Homogeneity, And Self Report Of Listener Groups , 2008 .

[27]  Nitesh Saxena,et al.  Spearphone: A Speech Privacy Exploit via Accelerometer-Sensed Reverberations from Smartphone Loudspeakers , 2019, ArXiv.

[28]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[29]  Klaus-Robert Müller,et al.  Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals , 2018, ArXiv.

[30]  Kang G. Shin,et al.  Continuous Authentication for Voice Assistants , 2017, MobiCom.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Gary M. Weiss,et al.  Activity recognition using cell phone accelerometers , 2011, SKDD.

[33]  Zhi Xu,et al.  TapLogger: inferring user inputs on smartphone touchscreens using on-board motion sensors , 2012, WISEC '12.

[34]  Isaac Griswold-Steiner,et al.  Kinetic Song Comprehension: Deciphering Personal Listening Habits via Phone Vibrations , 2019, ArXiv.

[35]  Paul J. M. Havinga,et al.  Towards Physical Activity Recognition Using Smartphone Sensors , 2013, 2013 IEEE 10th International Conference on Ubiquitous Intelligence and Computing and 2013 IEEE 10th International Conference on Autonomic and Trusted Computing.

[36]  Nikita Borisov,et al.  Exploring Ways To Mitigate Sensor-Based Smartphone Fingerprinting , 2015, ArXiv.

[37]  Jun Han,et al.  ACCessory: password inference using accelerometers on smartphones , 2012, HotMobile '12.

[38]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Jun Han,et al.  ACComplice: Location inference using accelerometers on smartphones , 2012, 2012 Fourth International Conference on Communication Systems and Networks (COMSNETS 2012).

[40]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.