The Audio Auditor: User-Level Membership Inference in Internet of Things Voice Services

Abstract With the rapid development of deep learning techniques, the popularity of voice services implemented on various Internet of Things (IoT) devices is ever increasing. In this paper, we examine user-level membership inference in the problem space of voice services, by designing an audio auditor to verify whether a specific user had unwillingly contributed audio used to train an automatic speech recognition (ASR) model under strict black-box access. With user representation of the input audio data and their corresponding translated text, our trained auditor is effective in user-level audit. We also observe that the auditor trained on specific data can be generalized well regardless of the ASR model architecture. We validate the auditor on ASR models trained with LSTM, RNNs, and GRU algorithms on two state-of-the-art pipelines, the hybrid ASR system and the end-to-end ASR system. Finally, we conduct a real-world trial of our auditor on iPhone Siri, achieving an overall accuracy exceeding 80%. We hope the methodology developed in this paper and findings can inform privacy advocates to overhaul IoT privacy.

[1]  Titouan Parcollet,et al.  The Pytorch-kaldi Speech Recognition Toolkit , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Kai Chen,et al.  Understanding Membership Inferences on Well-Generalized Learning Models , 2018, ArXiv.

[3]  Priyan Malarvizhi Kumar,et al.  An Automatic Tamil Speech Recognition system by using Bidirectional Recurrent Neural Network with Self-Organizing Map , 2019, Neural Computing and Applications.

[4]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[5]  Nan Zhang,et al.  Dangerous Skills: Understanding and Mitigating Security Risks of Voice-Controlled Third-Party Functions on Virtual Personal Assistant Systems , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[6]  Kai Peng,et al.  SocInf: Membership Inference Attacks on Social Media Health Data With Machine Learning , 2019, IEEE Transactions on Computational Social Systems.

[7]  Nicholas W. D. Evans,et al.  Preserving privacy in speaker and speech characterisation , 2019, Comput. Speech Lang..

[8]  Hafiz Malik,et al.  Securing Voice-Driven Interfaces Against Fake (Cloned) Audio Attacks , 2019, 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[9]  Jiamin Wang,et al.  Read Between the Lines: An Empirical Measurement of Sensitive Applications of Voice Personal Assistant Systems , 2020, WWW.

[10]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Maya Cakmak,et al.  Toys that Listen: A Study of Parents, Children, and Internet-Connected Toys , 2017, CHI.

[12]  Rayid Ghani,et al.  Aequitas: A Bias and Fairness Audit Toolkit , 2018, ArXiv.

[13]  Suchi Saria,et al.  Can You Trust This Prediction? Auditing Pointwise Reliability After Learning , 2019, AISTATS.

[14]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[15]  Chao Chen,et al.  The Audio Auditor: Participant-Level Membership Inference in Voice-Based IoT , 2019, ArXiv.

[16]  Virgílio A. F. Almeida,et al.  The Right to be Forgotten in the Media: A Data-Driven Study , 2016, Proc. Priv. Enhancing Technol..

[17]  Marc Tommasi,et al.  Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion? , 2019, INTERSPEECH.

[18]  Yu-Chih Tung,et al.  Exploiting Sound Masking for Audio Privacy in Smartphones , 2019, AsiaCCS.

[19]  Dorothea Kolossa,et al.  Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding , 2018, NDSS.

[20]  Srinivas Bangalore,et al.  Personalized speech recognition for Internet of Things , 2015, 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT).

[21]  Reza Shokri,et al.  Machine Learning with Membership Privacy using Adversarial Regularization , 2018, CCS.

[22]  Luis A. Hernández Gómez,et al.  Exploring Open-Source Deep Learning ASR for Speech-to-Text TV program transcription , 2018, IberSPEECH.

[23]  Ting Wang,et al.  SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems , 2019, AsiaCCS.

[24]  Vitaly Shmatikov,et al.  Auditing Data Provenance in Text-Generation Models , 2018, KDD.

[25]  Prasenjit Dey,et al.  End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention , 2018, INTERSPEECH.

[26]  John A. Weaver,et al.  And What Will You Do With It , 2011 .

[27]  Björn W. Schuller,et al.  Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.

[28]  Paul Deléglise,et al.  TED-LIUM: an Automatic Speech Recognition dedicated corpus , 2012, LREC.

[29]  H. Ney,et al.  VTLN-based voice conversion , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[30]  Jianwei Qian,et al.  Speech Sanitizer: Speech Content Desensitization and Voice Anonymization , 2021, IEEE Transactions on Dependable and Secure Computing.

[31]  Suresh Venkatasubramanian,et al.  Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[32]  Mohamed Ali Kaafar,et al.  Modelling and Quantifying Membership Information Leakage in Machine Learning , 2020, ArXiv.

[33]  Kai Chen,et al.  Devil's Whisper: A General Approach for Physical Adversarial Attacks against Commercial Black-box Speech Recognition Devices , 2020, USENIX Security Symposium.

[34]  Somesh Jha,et al.  Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , 2017, 2018 IEEE 31st Computer Security Foundations Symposium (CSF).

[35]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[36]  Lin-Shan Lee,et al.  Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Prateek Mittal,et al.  Privacy Risks of Securing Machine Learning Models against Adversarial Examples , 2019, CCS.

[38]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[39]  Emiliano De Cristofaro,et al.  LOGAN: Membership Inference Attacks Against Generative Models , 2017, Proc. Priv. Enhancing Technol..

[40]  Eamonn J. Keogh,et al.  Discovery of Meaningful Rules in Time Series , 2015, KDD.

[41]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).