SeVI: Boosting Secure Voice Interactions with Smart Devices

Voice interaction, as an emerging human-computer interaction method, has gained great popularity, especially on smart devices. However, due to the open nature of voice signals, voice interaction may cause privacy leakage. In this paper, we propose a novel scheme, called SeVI, to protect voice interaction from being deliberately or unintentionally eavesdropped. SeVI actively generates jamming noise of superior characteristics, while a user is performing voice interaction with his/her device, so that attackers cannot obtain the voice contents of the user. Mean-while, the device leverages the prior knowledge of the generated noise to adaptively cancel received noise, even when the device usage environment is changing due to movement, so that the user voice interactions are unaffected. SeVI relies on only normal microphone and speakers and can be implemented as light-weight software. We have implemented SeVI on a commercial off-the- shelf (COTS) smartphone and conducted extensive real-world experiments. The results demonstrate that SeVI can defend both online eavesdropping attacks and offline digital signal processing (DSP) analysis attacks.

[1]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Xi Wang,et al.  Speech enhancement based on auditory masking properties and log-spectral distance , 2013, Proceedings of 2013 3rd International Conference on Computer Science and Network Technology.

[3]  Jacob Benesty,et al.  A new class of doubletalk detectors based on cross-correlation , 2000, IEEE Trans. Speech Audio Process..

[4]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[5]  Dimitris A. Pados,et al.  An iterative algorithm for the computation of the MVDR filter , 2001, IEEE Trans. Signal Process..

[6]  B. Breed,et al.  A short proof of the equivalence of LCMV and GSC beamforming , 2002, IEEE Signal Processing Letters.

[7]  Stanley A. Gelfand,et al.  Hearing: An Introduction to Psychological and Physiological Acoustics, Fourth Edition , 1998 .

[8]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Wei Liu,et al.  Wideband Beamforming: Concepts and Techniques , 2010 .

[10]  Shengkui Zhao,et al.  Performance analysis and enhancements of adaptive algorithms and their applications , 2009 .

[11]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.