Voice-Indistinguishability: Protecting Voiceprint In Privacy-Preserving Speech Data Release

With the development of smart devices, such as the Amazon Echo and Apple’s HomePod, speech data have become a new dimension of big data. However, privacy and security concerns may hinder the collection and sharing of real-world speech data, which contain the speaker’s identifiable information, i.e., voiceprint, which is considered a type of biometric identifier. Current studies on voiceprint privacy protection do not provide either a meaningful privacy-utility trade-off or a formal and rigorous definition of privacy. In this study, we design a novel and rigorous privacy metric for voiceprint privacy, which is referred to as voice-indistinguishability, by extending differential privacy. We also propose mechanisms and frameworks for privacy-preserving speech data release satisfying voice-indistinguishability. Experiments on public datasets verify the effectiveness and efficiency of the proposed methods.

[1]  Catuscia Palamidessi,et al.  Geo-indistinguishability: differential privacy for location-based systems , 2012, CCS.

[2]  Joon Son Chung,et al.  VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.

[3]  Linlin Chen,et al.  Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity , 2018, SenSys.

[4]  Aaron Lawson,et al.  The Speakers in the Wild (SITW) Speaker Recognition Database , 2016, INTERSPEECH.

[5]  Simon Dobrisek,et al.  Speaker de-identification using diphone recognition and speech synthesis , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[6]  Junichi Yamagishi,et al.  Speaker Anonymization Using X-vector and Neural Waveform Models , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).

[7]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[9]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Paul Rad,et al.  Voice biometrics: Deep learning-based voiceprint authentication system , 2017, 2017 12th System of Systems Engineering Conference (SoSE).

[12]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[13]  Ira Kemelmacher-Shlizerman,et al.  Synthesizing Obama , 2017, ACM Trans. Graph..

[14]  Marc Tommasi,et al.  Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion? , 2019, INTERSPEECH.

[15]  Catuscia Palamidessi,et al.  Broadening the Scope of Differential Privacy Using Metrics , 2013, Privacy Enhancing Technologies.

[16]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[17]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[19]  Martin M. Merener Theoretical Results on De-Anonymization via Linkage Attacks , 2012, Trans. Data Priv..

[20]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Erik McDermott,et al.  Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Sergey Novoselov,et al.  On deep speaker embeddings for text-independent speaker recognition , 2018, Odyssey.

[23]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Liyue Fan,et al.  Practical Image Obfuscation with Provable Privacy , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[25]  Joon Son Chung,et al.  VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.

[26]  Shinji Watanabe,et al.  ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.

[27]  Isabel Trancoso,et al.  The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps towards a Common Understanding , 2019, INTERSPEECH.

[28]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[29]  Marc Tommasi,et al.  Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).