Practical Speech Re-use Prevention in Voice-driven Services

Voice-driven services (VDS) are being used in a variety of applications ranging from smart home control to payments using digital assistants. The input to such services is often captured via an open voice channel, e.g., using a microphone, in an unsupervised setting. One of the key operational security requirements in such setting is the freshness of the input speech. We present AEOLUS, a security overlay that proactively embeds a dynamic acoustic nonce at the time of user interaction, and detects the presence of the embedded nonce in the recorded speech to ensure freshness. We demonstrate that acoustic nonce can (i) be reliably embedded and retrieved, and (ii) be non-disruptive (and even imperceptible) to a VDS user. Optimal parameters (acoustic nonce’s operating frequency, amplitude, and bitrate) are determined for (i) and (ii) from a practical perspective. Experimental results show that AEOLUS yields 0.5% FRR at 0% FAR for speech re-use prevention upto a distance of 4 meters in three real-world environments with different background noise levels. We also conduct a user study with 120 participants, which shows that the acoustic nonce does not degrade overall user experience for 94.16% of speech samples, on average, in these environments. AEOLUS can therefore be used in practice to prevent speech re-use and ensure the freshness of speech input.

[1]  Jia Lun Tsai Efficient Nonce-based Authentication Scheme for Session Initiation Protocol , 2009, Int. J. Netw. Secur..

[2]  Don H. Johnson,et al.  Signal-to-noise ratio , 2006, Scholarpedia.

[3]  Yôiti Suzuki,et al.  Equal-loudness-level contours for pure tones. , 2004, The Journal of the Acoustical Society of America.

[4]  Wenyuan Xu,et al.  DolphinAttack: Inaudible Voice Commands , 2017, CCS.

[5]  Jun Ho Huh,et al.  Void: A fast and light voice liveness detection system , 2020, USENIX Security Symposium.

[6]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[7]  Christian Poellabauer,et al.  Protecting Voice Controlled Systems Using Sound Source Identification Based on Acoustic Cues , 2018, 2018 27th International Conference on Computer Communication and Networks (ICCCN).

[8]  Jian Yang,et al.  ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems , 2019, INTERSPEECH.

[9]  Ahmed H. Tewfik,et al.  Robust audio watermarking using perceptual masking , 1998, Signal Process..

[10]  Wei Sun,et al.  Combating Replay Attacks Against Voice Assistants , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[11]  Vyas Sekar,et al.  SPIFFY: Inducing Cost-Detectability Tradeoffs for Persistent Link-Flooding Attacks , 2016, NDSS.

[12]  Jürgen Herre,et al.  Digital Watermarking and Its Influence on Audio Quality , 1998 .

[13]  Andreas Stolcke,et al.  The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  P. Kumar,et al.  Comparison of Bit Error Rate for propagation mechanisms of millimeter waves in a practical communication systems employing PSK and FSK , 2010 .

[15]  Darko Kirovski,et al.  Spread-spectrum watermarking of audio signals , 2003, IEEE Trans. Signal Process..

[16]  Qi Li,et al.  When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition , 2020, CCS.

[17]  Patrick Traynor,et al.  Fear the Reaper: Characterization and Fast Detection of Card Skimmers , 2018, USENIX Security Symposium.

[18]  Jiwu Huang,et al.  Efficiently self-synchronized audio watermarking for assured audio data transmission , 2005, IEEE Transactions on Broadcasting.

[19]  Ioannis Pitas,et al.  Robust audio watermarking in the time domain , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[20]  Christine Geeng,et al.  EGregor: An Eldritch Privacy Mental Model for Smart Assistants , 2020, CHI Extended Abstracts.

[21]  Kang G. Shin,et al.  Continuous Authentication for Voice Assistants , 2017, MobiCom.

[22]  Nedeljko Cvejic,et al.  ALGORITHMS FOR AUDIO WATERMARKING AND STEGANOGRAPHY , 2004 .

[23]  Tapio Seppänen,et al.  Audio Watermarking: Requirements, Algorithms, and Benchmarking , 2005 .

[24]  Personal Music Players & Hearing , 2019 .

[25]  Tapio Seppänen,et al.  Spread spectrum audio watermarking using frequency hopping and attack characterization , 2004, Signal Process..

[26]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[27]  Nuria Campillo-Davo,et al.  Study of the effectiveness of electric vehicle warning sounds depending on the urban environment , 2017 .

[28]  Aziz Mohaisen,et al.  You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[29]  Virgil D. Gligor,et al.  CoDef: collaborative defense against large-scale link-flooding attacks , 2013, CoNEXT.

[30]  R. Fay Acoustic Communication , 2003, Springer Handbook of Auditory Research.

[31]  Mihir Bellare,et al.  Provably secure session key distribution: the three party case , 1995, STOC '95.

[32]  Ingemar J. Cox,et al.  A Secure, Robust Watermark for Multimedia , 1996, Information Hiding.

[33]  Yang Liu,et al.  Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems , 2019, ArXiv.

[34]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[35]  Patrick Traynor,et al.  Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems , 2019, NDSS.

[36]  Jin Woo Hong,et al.  Audio watermarking for copyright protection of digital audio data , 2001 .

[37]  André van Schaik,et al.  Room acoustics simulation for multichannel microphone arrays , 2010 .

[38]  Tao Chen,et al.  Metamorph: Injecting Inaudible Commands into Over-the-air Voice Controlled Systems , 2020, NDSS.

[39]  Dimitris Gritzalis,et al.  Audio CAPTCHA: Existing solutions assessment and a new implementation for VoIP telephony , 2010, Comput. Secur..

[40]  Jihyun Park,et al.  Underwater Acoustic Communication Channel Simulator for Flat Fading , 2010 .