论文信息 - Understanding the Tradeoffs in Client-Side Privacy for Speech Recognition

Understanding the Tradeoffs in Client-Side Privacy for Speech Recognition

Existing approaches to ensuring privacy of user speech data primarily focus on server-side approaches. While improving server-side privacy reduces certain security concerns, users still do not retain control over whether privacy is ensured on the client-side. In this paper, we define, evaluate, and explore techniques for client-side privacy in speech recognition, where the goal is to preserve privacy on raw speech data before leaving the client’s device. We first formalize several tradeoffs in ensuring client-side privacy between performance, compute requirements, and privacy. Using our tradeoff analysis, we perform a large-scale empirical study on existing approaches and find that they fall short on at least one metric. Our results call for more research in this crucial area as a step towards safer real-world deployment of speech recognition systems at scale across mobile devices.

Ruslan Salakhutdinov | Louis-Philippe Morency | Paul Pu Liang | Peter Wu

[1] Rita Singh,et al. Profiling Humans from their Voice , 2019 .

[2] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Parameswaran Ramanathan,et al. Prεεch: A System for Privacy-Preserving Speech Transcription , 2019, USENIX Security Symposium.

[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[6] Ruslan Salakhutdinov,et al. Think Locally, Act Globally: Federated Learning with Local and Global Representations , 2020, ArXiv.

[7] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.

[8] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[9] Jie Xu,et al. Federated Learning for Healthcare Informatics , 2019, ArXiv.

[10] H. Haddadi,et al. Privacy-preserving Voice Analysis via Disentangled Representations , 2020, CCSW@CCS.

[11] Marc Tommasi,et al. Design Choices for X-vector Based Speaker Anonymization , 2020, INTERSPEECH.

[12] Hung-yi Lee,et al. One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization , 2019, INTERSPEECH.

[13] Joseph Dureau,et al. Federated Learning for Keyword Spotting , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .

[15] Feifei Li,et al. Privacy-Preserving Outsourced Speech Recognition for Smart IoT Devices , 2019, IEEE Internet of Things Journal.

[16] Lin-Shan Lee,et al. Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations , 2018, INTERSPEECH.

[17] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[18] Ian McGraw,et al. Personalized speech recognition on mobile devices , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Tomoki Toda,et al. The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS , 2020, Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020.

[20] Olof Mogren,et al. Adversarial representation learning for private speech generation , 2020, ArXiv.

[21] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Xin Qin,et al. FedHealth: A Federated Transfer Learning Framework for Wearable Healthcare , 2019, IEEE Intelligent Systems.

[23] Tassilo Klein,et al. Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[24] E. Vincent,et al. Introducing the VoicePrivacy Initiative , 2020, INTERSPEECH.